Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for-next (regular) #1268

Open
wants to merge 10,000 commits into
base: ci
Choose a base branch
from
Open

Test for-next (regular) #1268

wants to merge 10,000 commits into from
This pull request is big! We’re only showing the most recent 250 commits.

Commits on Nov 5, 2024

  1. net: ethernet: ti: am65-cpsw: fix warning in am65_cpsw_nuss_remove_rx…

    …_chns()
    
    flow->irq is initialized to 0 which is a valid IRQ. Set it to -EINVAL
    in error path of am65_cpsw_nuss_init_rx_chns() so we do not try
    to free an unallocated IRQ in am65_cpsw_nuss_remove_rx_chns().
    
    If user tried to change number of RX queues and am65_cpsw_nuss_init_rx_chns()
    failed due to any reason, the warning will happen if user tries to change
    the number of RX queues after the error condition.
    
    root@am62xx-evm:~# ethtool -L eth0 rx 3
    [   40.385293] am65-cpsw-nuss 8000000.ethernet: set new flow-id-base 19
    [   40.393211] am65-cpsw-nuss 8000000.ethernet: Failed to init rx flow2
    netlink error: Invalid argument
    root@am62xx-evm:~# ethtool -L eth0 rx 2
    [   82.306427] ------------[ cut here ]------------
    [   82.311075] WARNING: CPU: 0 PID: 378 at kernel/irq/devres.c:144 devm_free_irq+0x84/0x90
    [   82.469770] Call trace:
    [   82.472208]  devm_free_irq+0x84/0x90
    [   82.475777]  am65_cpsw_nuss_remove_rx_chns+0x6c/0xac [ti_am65_cpsw_nuss]
    [   82.482487]  am65_cpsw_nuss_update_tx_rx_chns+0x2c/0x9c [ti_am65_cpsw_nuss]
    [   82.489442]  am65_cpsw_set_channels+0x30/0x4c [ti_am65_cpsw_nuss]
    [   82.495531]  ethnl_set_channels+0x224/0x2dc
    [   82.499713]  ethnl_default_set_doit+0xb8/0x1b8
    [   82.504149]  genl_family_rcv_msg_doit+0xc0/0x124
    [   82.508757]  genl_rcv_msg+0x1f0/0x284
    [   82.512409]  netlink_rcv_skb+0x58/0x130
    [   82.516239]  genl_rcv+0x38/0x50
    [   82.519374]  netlink_unicast+0x1d0/0x2b0
    [   82.523289]  netlink_sendmsg+0x180/0x3c4
    [   82.527205]  __sys_sendto+0xe4/0x158
    [   82.530779]  __arm64_sys_sendto+0x28/0x38
    [   82.534782]  invoke_syscall+0x44/0x100
    [   82.538526]  el0_svc_common.constprop.0+0xc0/0xe0
    [   82.543221]  do_el0_svc+0x1c/0x28
    [   82.546528]  el0_svc+0x28/0x98
    [   82.549578]  el0t_64_sync_handler+0xc0/0xc4
    [   82.553752]  el0t_64_sync+0x190/0x194
    [   82.557407] ---[ end trace 0000000000000000 ]---
    
    Fixes: da70d18 ("net: ethernet: ti: am65-cpsw: Introduce multi queue Rx")
    Signed-off-by: Roger Quadros <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    rogerq authored and Paolo Abeni committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    ba3b7ac View commit details
    Browse the repository at this point in the history
  2. Merge branch 'net-ethernet-ti-am65-cpsw-fixes-to-multi-queue-rx-feature'

    Roger Quadros says:
    
    ====================
    net: ethernet: ti: am65-cpsw: Fixes to multi queue RX feature
    
    On J7 platforms, setting up multiple RX flows was failing
    as the RX free descriptor ring 0 is shared among all flows
    and we did not allocate enough elements in the RX free descriptor
    ring 0 to accommodate for all RX flows. Patch 1 fixes this.
    
    The second patch fixes a warning if there was any error in
    am65_cpsw_nuss_init_rx_chns() and am65_cpsw_nuss_cleanup_rx_chns()
    was called after that.
    
    Signed-off-by: Roger Quadros <[email protected]>
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Paolo Abeni committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    9eaff63 View commit details
    Browse the repository at this point in the history
  3. drm/amdgpu: Fix DPX valid mode check on GC 9.4.3

    For DPX mode, the number of memory partitions supported should be less
    than or equal to 2.
    
    Fixes: 1589c82 ("drm/amdgpu: Check memory ranges for valid xcp mode")
    Signed-off-by: Lijo Lazar <[email protected]>
    Reviewed-by: Hawking Zhang <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    (cherry picked from commit 990c4f5)
    Cc: [email protected]
    Lijo Lazar authored and alexdeucher committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    3ce3f85 View commit details
    Browse the repository at this point in the history
  4. drm/amdgpu: Adjust debugfs register access permissions

    Regular users shouldn't have read access.
    
    Reviewed-by: Yang Wang <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    (cherry picked from commit c0cfd2e)
    Cc: [email protected]
    alexdeucher committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    b46dadf View commit details
    Browse the repository at this point in the history
  5. drm/amdgpu: Adjust debugfs eviction and IB access permissions

    Users should not be able to run these.
    
    Reviewed-by: Yang Wang <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    (cherry picked from commit 7ba9395)
    Cc: [email protected]
    alexdeucher committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    f790a2c View commit details
    Browse the repository at this point in the history
  6. drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()

    Avoid a possible buffer overflow if size is larger than 4K.
    
    Reviewed-by: Yang Wang <[email protected]>
    Signed-off-by: Alex Deucher <[email protected]>
    (cherry picked from commit f5d873f)
    Cc: [email protected]
    alexdeucher committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    4d75b94 View commit details
    Browse the repository at this point in the history
  7. ASoC: stm32: spdifrx: fix dma channel release in stm32_spdifrx_remove

    In case of error when requesting ctrl_chan DMA channel, ctrl_chan is not
    null. So the release of the dma channel leads to the following issue:
    [    4.879000] st,stm32-spdifrx 500d0000.audio-controller:
    dma_request_slave_channel error -19
    [    4.888975] Unable to handle kernel NULL pointer dereference
    at virtual address 000000000000003d
    [...]
    [    5.096577] Call trace:
    [    5.099099]  dma_release_channel+0x24/0x100
    [    5.103235]  stm32_spdifrx_remove+0x24/0x60 [snd_soc_stm32_spdifrx]
    [    5.109494]  stm32_spdifrx_probe+0x320/0x4c4 [snd_soc_stm32_spdifrx]
    
    To avoid this issue, release channel only if the pointer is valid.
    
    Fixes: 794df94 ("ASoC: stm32: spdifrx: manage rebind issue")
    Signed-off-by: Amelie Delaunay <[email protected]>
    Signed-off-by: Olivier Moysan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    ADESTM authored and broonie committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    9bb4af4 View commit details
    Browse the repository at this point in the history
  8. mm/slab: fix warning caused by duplicate kmem_cache creation in kmem_…

    …buckets_create
    
    Commit b035f5a ("mm: slab: reduce the kmalloc() minimum alignment
    if DMA bouncing possible") reduced ARCH_KMALLOC_MINALIGN to 8 on arm64.
    However, with KASAN_HW_TAGS enabled, arch_slab_minalign() becomes 16.
    This causes kmalloc_caches[*][8] to be aliased to kmalloc_caches[*][16],
    resulting in kmem_buckets_create() attempting to create a kmem_cache for
    size 16 twice. This duplication triggers warnings on boot:
    
    [    2.325108] ------------[ cut here ]------------
    [    2.325135] kmem_cache of name 'memdup_user-16' already exists
    [    2.325783] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:107 __kmem_cache_create_args+0xb8/0x3b0
    [    2.327957] Modules linked in:
    [    2.328550] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5mm-unstable-arm64+ torvalds#12
    [    2.328683] Hardware name: QEMU QEMU Virtual Machine, BIOS 2024.02-2 03/11/2024
    [    2.328790] pstate: 61000009 (nZCv daif -PAN -UAO -TCO +DIT -SSBS BTYPE=--)
    [    2.328911] pc : __kmem_cache_create_args+0xb8/0x3b0
    [    2.328930] lr : __kmem_cache_create_args+0xb8/0x3b0
    [    2.328942] sp : ffff800083d6fc50
    [    2.328961] x29: ffff800083d6fc50 x28: f2ff0000c1674410 x27: ffff8000820b0598
    [    2.329061] x26: 000000007fffffff x25: 0000000000000010 x24: 0000000000002000
    [    2.329101] x23: ffff800083d6fce8 x22: ffff8000832222e8 x21: ffff800083222388
    [    2.329118] x20: f2ff0000c1674410 x19: f5ff0000c16364c0 x18: ffff800083d80030
    [    2.329135] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
    [    2.329152] x14: 0000000000000000 x13: 0a73747369786520 x12: 79646165726c6120
    [    2.329169] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : 0000000000000000
    [    2.329194] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
    [    2.329210] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
    [    2.329226] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
    [    2.329291] Call trace:
    [    2.329407]  __kmem_cache_create_args+0xb8/0x3b0
    [    2.329499]  kmem_buckets_create+0xfc/0x320
    [    2.329526]  init_user_buckets+0x34/0x78
    [    2.329540]  do_one_initcall+0x64/0x3c8
    [    2.329550]  kernel_init_freeable+0x26c/0x578
    [    2.329562]  kernel_init+0x3c/0x258
    [    2.329574]  ret_from_fork+0x10/0x20
    [    2.329698] ---[ end trace 0000000000000000 ]---
    
    [    2.403704] ------------[ cut here ]------------
    [    2.404716] kmem_cache of name 'msg_msg-16' already exists
    [    2.404801] WARNING: CPU: 2 PID: 1 at mm/slab_common.c:107 __kmem_cache_create_args+0xb8/0x3b0
    [    2.404842] Modules linked in:
    [    2.404971] CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.12.0-rc5mm-unstable-arm64+ torvalds#12
    [    2.405026] Tainted: [W]=WARN
    [    2.405043] Hardware name: QEMU QEMU Virtual Machine, BIOS 2024.02-2 03/11/2024
    [    2.405057] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [    2.405079] pc : __kmem_cache_create_args+0xb8/0x3b0
    [    2.405100] lr : __kmem_cache_create_args+0xb8/0x3b0
    [    2.405111] sp : ffff800083d6fc50
    [    2.405115] x29: ffff800083d6fc50 x28: fbff0000c1674410 x27: ffff8000820b0598
    [    2.405135] x26: 000000000000ffd0 x25: 0000000000000010 x24: 0000000000006000
    [    2.405153] x23: ffff800083d6fce8 x22: ffff8000832222e8 x21: ffff800083222388
    [    2.405169] x20: fbff0000c1674410 x19: fdff0000c163d6c0 x18: ffff800083d80030
    [    2.405185] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
    [    2.405201] x14: 0000000000000000 x13: 0a73747369786520 x12: 79646165726c6120
    [    2.405217] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : 0000000000000000
    [    2.405233] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
    [    2.405248] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
    [    2.405271] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
    [    2.405287] Call trace:
    [    2.405293]  __kmem_cache_create_args+0xb8/0x3b0
    [    2.405305]  kmem_buckets_create+0xfc/0x320
    [    2.405315]  init_msg_buckets+0x34/0x78
    [    2.405326]  do_one_initcall+0x64/0x3c8
    [    2.405337]  kernel_init_freeable+0x26c/0x578
    [    2.405348]  kernel_init+0x3c/0x258
    [    2.405360]  ret_from_fork+0x10/0x20
    [    2.405370] ---[ end trace 0000000000000000 ]---
    
    To address this, alias kmem_cache for sizes smaller than min alignment
    to the aligned sized kmem_cache, as done with the default system kmalloc
    bucket.
    
    Fixes: b32801d ("mm/slab: Introduce kmem_buckets_create() and family")
    Cc: <[email protected]> # v6.11+
    Signed-off-by: Koichiro Den <[email protected]>
    Reviewed-by: Catalin Marinas <[email protected]>
    Tested-by: Catalin Marinas <[email protected]>
    Signed-off-by: Vlastimil Babka <[email protected]>
    lkpdn authored and tehcaster committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    9c9201a View commit details
    Browse the repository at this point in the history
  9. Merge tag 'qcom-clk-fixes-for-6.12' of https://git.kernel.org/pub/scm…

    …/linux/kernel/git/qcom/linux into clk-fixes
    
    Pull Qualcomm clk driver fixes from Bjorn Andersson:
    
     - Correct flags for X Elite USB MP GDSC and pcie pipediv2 clocks
     - Fix alpha PLL post_div mask for the cases where width is not
       specified
     - Avoid hangs in the SM8350 video driver (venus) by setting HW_CTRL
       trigger feature on the video clocks
    
    * tag 'qcom-clk-fixes-for-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
      clk: qcom: gcc-x1e80100: Fix USB MP SS1 PHY GDSC pwrsts flags
      clk: qcom: gcc-x1e80100: Fix halt_check for pipediv2 clocks
      clk: qcom: clk-alpha-pll: Fix pll post div mask when width is not set
      clk: qcom: videocc-sm8350: use HW_CTRL_TRIGGER for vcodec GDSCs
    bebarino committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    714398d View commit details
    Browse the repository at this point in the history
  10. drm/xe: Fix possible exec queue leak in exec IOCTL

    In a couple of places after an exec queue is looked up the exec IOCTL
    returns on input errors without dropping the exec queue ref. Fix this
    ensuring the exec queue ref is dropped on input error.
    
    Fixes: dd08ebf ("drm/xe: Introduce a new DRM driver for Intel GPUs")
    Cc: <[email protected]>
    Signed-off-by: Matthew Brost <[email protected]>
    Reviewed-by: Tejas Upadhyay <[email protected]>
    Reviewed-by: Rodrigo Vivi <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    (cherry picked from commit 07064a2)
    Signed-off-by: Lucas De Marchi <[email protected]>
    mbrost05 authored and lucasdemarchi committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    af797b8 View commit details
    Browse the repository at this point in the history
  11. drm/xe: Drop VM dma-resv lock on xe_sync_in_fence_get failure in exec…

    … IOCTL
    
    Upon failure all locks need to be dropped before returning to the user.
    
    Fixes: 58480c1 ("drm/xe: Skip VMAs pin when requesting signal to the last XE_EXEC")
    Cc: <[email protected]>
    Signed-off-by: Matthew Brost <[email protected]>
    Reviewed-by: Tejas Upadhyay <[email protected]>
    Reviewed-by: Rodrigo Vivi <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    (cherry picked from commit 7d1a425)
    Signed-off-by: Lucas De Marchi <[email protected]>
    mbrost05 authored and lucasdemarchi committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    64a2b6e View commit details
    Browse the repository at this point in the history
  12. drm/xe/pf: Fix potential GGTT allocation leak

    In unlikely event that we fail during sending the new VF GGTT
    configuration to the GuC, we will free only the GGTT node data
    struct but will miss to release the actual GGTT allocation.
    
    This will later lead to list corruption, GGTT space leak and
    finally risking crash when unloading the driver:
    
     [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-EIO)
     [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT
    
     [ ] list_add corruption. next->prev should be prev (ffff88813cfcd628), but was 0000000000000000. (next=ffff88813cfe2028).
     [ ] RIP: 0010:__list_add_valid_or_report+0x6b/0xb0
     [ ] Call Trace:
     [ ]  drm_mm_insert_node_in_range+0x2c0/0x4e0
     [ ]  xe_ggtt_node_insert+0x46/0x70 [xe]
     [ ]  pf_provision_vf_ggtt+0x7f5/0xa70 [xe]
     [ ]  xe_gt_sriov_pf_config_set_ggtt+0x5e/0x770 [xe]
     [ ]  ggtt_set+0x4b/0x70 [xe]
     [ ]  simple_attr_write_xsigned.constprop.0.isra.0+0xb0/0x110
    
     [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-ENOSPC)
     [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT
    
     [ ] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b7b: 0000 [#1] PREEMPT SMP NOPTI
     [ ] RIP: 0010:drm_mm_remove_node+0x1b7/0x390
     [ ] Call Trace:
     [ ]  <TASK>
     [ ]  ? die_addr+0x2e/0x80
     [ ]  ? exc_general_protection+0x1a1/0x3e0
     [ ]  ? asm_exc_general_protection+0x22/0x30
     [ ]  ? drm_mm_remove_node+0x1b7/0x390
     [ ]  ggtt_node_remove+0xa5/0xf0 [xe]
     [ ]  xe_ggtt_node_remove+0x35/0x70 [xe]
     [ ]  xe_ttm_bo_destroy+0x123/0x220 [xe]
     [ ]  intel_user_framebuffer_destroy+0x44/0x70 [xe]
     [ ]  intel_plane_destroy_state+0x3b/0xc0 [xe]
     [ ]  drm_atomic_state_default_clear+0x1cd/0x2f0
     [ ]  intel_atomic_state_clear+0x9/0x20 [xe]
     [ ]  __drm_atomic_state_free+0x1d/0xb0
    
    Fix that by using pf_release_ggtt() on the error path, which now
    works regardless if the node has GGTT allocation or not.
    
    Fixes: 34e8042 ("drm/xe: Make xe_ggtt_node struct independent")
    Signed-off-by: Michal Wajdeczko <[email protected]>
    Cc: Rodrigo Vivi <[email protected]>
    Cc: Matthew Brost <[email protected]>
    Cc: Matthew Auld <[email protected]>
    Reviewed-by: Matthew Brost <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    (cherry picked from commit 43b1dd2)
    Signed-off-by: Lucas De Marchi <[email protected]>
    mwajdecz authored and lucasdemarchi committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    a353c78 View commit details
    Browse the repository at this point in the history
  13. drm/xe: Stop accumulating LRC timestamp on job_free

    The exec queue timestamp is only really useful when it's being queried
    through the fdinfo. There's no need to update it so often, on every
    job_free. Tracing a simple app like vkcube running shows an update
    rate of ~ 120Hz. In case of discrete, the BO is on vram, creating a lot
    of pcie transactions.
    
    The update on job_free() is used to cover a gap: if exec
    queue is created and destroyed rapidly, before a new query, the
    timestamp still needs to be accumulated and accounted for in the xef.
    
    Initial implementation in commit 6109f24 ("drm/xe: Add helper to
    accumulate exec queue runtime") couldn't do it on the exec_queue_fini
    since the xef could be gone at that point. However since commit
    ce8c161 ("drm/xe: Add ref counting for xe_file") the xef is
    refcounted and the exec queue always holds a reference, making this safe
    now.
    
    Improve the fix in commit 2149ded ("drm/xe: Fix use after free when
    client stats are captured") by reducing the frequency in which the
    update is needed.
    
    Fixes: 2149ded ("drm/xe: Fix use after free when client stats are captured")
    Reviewed-by: Nirmoy Das <[email protected]>
    Reviewed-by: Jonathan Cavitt <[email protected]>
    Reviewed-by: Umesh Nerlige Ramappa <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    Signed-off-by: Lucas De Marchi <[email protected]>
    (cherry picked from commit 83db047)
    Signed-off-by: Lucas De Marchi <[email protected]>
    lucasdemarchi committed Nov 5, 2024
    Configuration menu
    Copy the full SHA
    514447a View commit details
    Browse the repository at this point in the history

Commits on Nov 6, 2024

  1. mm/thp: fix deferred split queue not partially_mapped

    Recent changes are putting more pressure on THP deferred split queues:
    under load revealing long-standing races, causing list_del corruptions,
    "Bad page state"s and worse (I keep BUGs in both of those, so usually
    don't get to see how badly they end up without).  The relevant recent
    changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin,
    improved swap allocation, and underused THP splitting.
    
    The new unlocked list_del_init() in deferred_split_scan() is buggy.  I
    gave bad advice, it looks plausible since that's a local on-stack list,
    but the fact is that it can race with a third party freeing or migrating
    the preceding folio (properly unqueueing it with refcount 0 while holding
    split_queue_lock), thereby corrupting the list linkage.
    
    The obvious answer would be to take split_queue_lock there: but it has a
    long history of contention, so I'm reluctant to add to that.  Instead,
    make sure that there is always one safe (raised refcount) folio before, by
    delaying its folio_put().  (And of course I was wrong to suggest updating
    split_queue_len without the lock: leave that until the splice.)
    
    And remove two over-eager partially_mapped checks, restoring those tests
    to how they were before: if uncharge_folio() or free_tail_page_prepare()
    finds _deferred_list non-empty, it's in trouble whether or not that folio
    is partially_mapped (and the flag was already cleared in the latter case).
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: dafff3f ("mm: split underused THPs")
    Signed-off-by: Hugh Dickins <[email protected]>
    Acked-by: Usama Arif <[email protected]>
    Reviewed-by: David Hildenbrand <[email protected]>
    Reviewed-by: Baolin Wang <[email protected]>
    Acked-by: Zi Yan <[email protected]>
    Cc: Barry Song <[email protected]>
    Cc: Chris Li <[email protected]>
    Cc: Johannes Weiner <[email protected]>
    Cc: Kefeng Wang <[email protected]>
    Cc: Kirill A. Shutemov <[email protected]>
    Cc: Matthew Wilcox (Oracle) <[email protected]>
    Cc: Nhat Pham <[email protected]>
    Cc: Ryan Roberts <[email protected]>
    Cc: Shakeel Butt <[email protected]>
    Cc: Wei Yang <[email protected]>
    Cc: Yang Shi <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Hugh Dickins authored and akpm00 committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    e66f318 View commit details
    Browse the repository at this point in the history
  2. mm/thp: fix deferred split unqueue naming and locking

    Recent changes are putting more pressure on THP deferred split queues:
    under load revealing long-standing races, causing list_del corruptions,
    "Bad page state"s and worse (I keep BUGs in both of those, so usually
    don't get to see how badly they end up without).  The relevant recent
    changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin,
    improved swap allocation, and underused THP splitting.
    
    Before fixing locking: rename misleading folio_undo_large_rmappable(),
    which does not undo large_rmappable, to folio_unqueue_deferred_split(),
    which is what it does.  But that and its out-of-line __callee are mm
    internals of very limited usability: add comment and WARN_ON_ONCEs to
    check usage; and return a bool to say if a deferred split was unqueued,
    which can then be used in WARN_ON_ONCEs around safety checks (sparing
    callers the arcane conditionals in __folio_unqueue_deferred_split()).
    
    Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all
    of whose callers now call it beforehand (and if any forget then bad_page()
    will tell) - except for its caller put_pages_list(), which itself no
    longer has any callers (and will be deleted separately).
    
    Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0
    without checking and unqueueing a THP folio from deferred split list;
    which is unfortunate, since the split_queue_lock depends on the memcg
    (when memcg is enabled); so swapout has been unqueueing such THPs later,
    when freeing the folio, using the pgdat's lock instead: potentially
    corrupting the memcg's list.  __remove_mapping() has frozen refcount to 0
    here, so no problem with calling folio_unqueue_deferred_split() before
    resetting memcg_data.
    
    That goes back to 5.4 commit 87eaceb ("mm: thp: make deferred split
    shrinker memcg aware"): which included a check on swapcache before adding
    to deferred queue, but no check on deferred queue before adding THP to
    swapcache.  That worked fine with the usual sequence of events in reclaim
    (though there were a couple of rare ways in which a THP on deferred queue
    could have been swapped out), but 6.12 commit dafff3f ("mm: split
    underused THPs") avoids splitting underused THPs in reclaim, which makes
    swapcache THPs on deferred queue commonplace.
    
    Keep the check on swapcache before adding to deferred queue?  Yes: it is
    no longer essential, but preserves the existing behaviour, and is likely
    to be a worthwhile optimization (vmstat showed much more traffic on the
    queue under swapping load if the check was removed); update its comment.
    
    Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing
    folio->memcg_data without checking and unqueueing a THP folio from the
    deferred list, sometimes corrupting "from" memcg's list, like swapout. 
    Refcount is non-zero here, so folio_unqueue_deferred_split() can only be
    used in a WARN_ON_ONCE to validate the fix, which must be done earlier:
    mem_cgroup_move_charge_pte_range() first try to split the THP (splitting
    of course unqueues), or skip it if that fails.  Not ideal, but moving
    charge has been requested, and khugepaged should repair the THP later:
    nobody wants new custom unqueueing code just for this deprecated case.
    
    The 87eaceb commit did have the code to move from one deferred list
    to another (but was not conscious of its unsafety while refcount non-0);
    but that was removed by 5.6 commit fac0516 ("mm: thp: don't need care
    deferred split queue in memcg charge move path"), which argued that the
    existence of a PMD mapping guarantees that the THP cannot be on a deferred
    list.  As above, false in rare cases, and now commonly false.
    
    Backport to 6.11 should be straightforward.  Earlier backports must take
    care that other _deferred_list fixes and dependencies are included.  There
    is not a strong case for backports, but they can fix cornercases.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 87eaceb ("mm: thp: make deferred split shrinker memcg aware")
    Fixes: dafff3f ("mm: split underused THPs")
    Signed-off-by: Hugh Dickins <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Reviewed-by: Yang Shi <[email protected]>
    Cc: Baolin Wang <[email protected]>
    Cc: Barry Song <[email protected]>
    Cc: Chris Li <[email protected]>
    Cc: Johannes Weiner <[email protected]>
    Cc: Kefeng Wang <[email protected]>
    Cc: Kirill A. Shutemov <[email protected]>
    Cc: Matthew Wilcox (Oracle) <[email protected]>
    Cc: Nhat Pham <[email protected]>
    Cc: Ryan Roberts <[email protected]>
    Cc: Shakeel Butt <[email protected]>
    Cc: Usama Arif <[email protected]>
    Cc: Wei Yang <[email protected]>
    Cc: Zi Yan <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Hugh Dickins authored and akpm00 committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    f8f931b View commit details
    Browse the repository at this point in the history
  3. mm: avoid unsafe VMA hook invocation when error arises on mmap hook

    Patch series "fix error handling in mmap_region() and refactor
    (hotfixes)", v4.
    
    mmap_region() is somewhat terrifying, with spaghetti-like control flow and
    numerous means by which issues can arise and incomplete state, memory
    leaks and other unpleasantness can occur.
    
    A large amount of the complexity arises from trying to handle errors late
    in the process of mapping a VMA, which forms the basis of recently
    observed issues with resource leaks and observable inconsistent state.
    
    This series goes to great lengths to simplify how mmap_region() works and
    to avoid unwinding errors late on in the process of setting up the VMA for
    the new mapping, and equally avoids such operations occurring while the
    VMA is in an inconsistent state.
    
    The patches in this series comprise the minimal changes required to
    resolve existing issues in mmap_region() error handling, in order that
    they can be hotfixed and backported.  There is additionally a follow up
    series which goes further, separated out from the v1 series and sent and
    updated separately.
    
    
    This patch (of 5):
    
    After an attempted mmap() fails, we are no longer in a situation where we
    can safely interact with VMA hooks.  This is currently not enforced,
    meaning that we need complicated handling to ensure we do not incorrectly
    call these hooks.
    
    We can avoid the whole issue by treating the VMA as suspect the moment
    that the file->f_ops->mmap() function reports an error by replacing
    whatever VMA operations were installed with a dummy empty set of VMA
    operations.
    
    We do so through a new helper function internal to mm - mmap_file() -
    which is both more logically named than the existing call_mmap() function
    and correctly isolates handling of the vm_op reassignment to mm.
    
    All the existing invocations of call_mmap() outside of mm are ultimately
    nested within the call_mmap() from mm, which we now replace.
    
    It is therefore safe to leave call_mmap() in place as a convenience
    function (and to avoid churn).  The invokers are:
    
         ovl_file_operations -> mmap -> ovl_mmap() -> backing_file_mmap()
        coda_file_operations -> mmap -> coda_file_mmap()
         shm_file_operations -> shm_mmap()
    shm_file_operations_huge -> shm_mmap()
                dma_buf_fops -> dma_buf_mmap_internal -> i915_dmabuf_ops
    	                    -> i915_gem_dmabuf_mmap()
    
    None of these callers interact with vm_ops or mappings in a problematic
    way on error, quickly exiting out.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/d41fd763496fd0048a962f3fd9407dc72dd4fd86.1730224667.git.lorenzo.stoakes@oracle.com
    Fixes: deb0f65 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
    Signed-off-by: Lorenzo Stoakes <[email protected]>
    Reported-by: Jann Horn <[email protected]>
    Reviewed-by: Liam R. Howlett <[email protected]>
    Reviewed-by: Vlastimil Babka <[email protected]>
    Reviewed-by: Jann Horn <[email protected]>
    Cc: Andreas Larsson <[email protected]>
    Cc: Catalin Marinas <[email protected]>
    Cc: David S. Miller <[email protected]>
    Cc: Helge Deller <[email protected]>
    Cc: James E.J. Bottomley <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    lorenzo-stoakes authored and akpm00 committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    3dd6ed3 View commit details
    Browse the repository at this point in the history
  4. mm: unconditionally close VMAs on error

    Incorrect invocation of VMA callbacks when the VMA is no longer in a
    consistent state is bug prone and risky to perform.
    
    With regards to the important vm_ops->close() callback We have gone to
    great lengths to try to track whether or not we ought to close VMAs.
    
    Rather than doing so and risking making a mistake somewhere, instead
    unconditionally close and reset vma->vm_ops to an empty dummy operations
    set with a NULL .close operator.
    
    We introduce a new function to do so - vma_close() - and simplify existing
    vms logic which tracked whether we needed to close or not.
    
    This simplifies the logic, avoids incorrect double-calling of the .close()
    callback and allows us to update error paths to simply call vma_close()
    unconditionally - making VMA closure idempotent.
    
    Link: https://lkml.kernel.org/r/28e89dda96f68c505cb6f8e9fc9b57c3e9f74b42.1730224667.git.lorenzo.stoakes@oracle.com
    Fixes: deb0f65 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
    Signed-off-by: Lorenzo Stoakes <[email protected]>
    Reported-by: Jann Horn <[email protected]>
    Reviewed-by: Vlastimil Babka <[email protected]>
    Reviewed-by: Liam R. Howlett <[email protected]>
    Reviewed-by: Jann Horn <[email protected]>
    Cc: Andreas Larsson <[email protected]>
    Cc: Catalin Marinas <[email protected]>
    Cc: David S. Miller <[email protected]>
    Cc: Helge Deller <[email protected]>
    Cc: James E.J. Bottomley <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    lorenzo-stoakes authored and akpm00 committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    4080ef1 View commit details
    Browse the repository at this point in the history
  5. mm: refactor map_deny_write_exec()

    Refactor the map_deny_write_exec() to not unnecessarily require a VMA
    parameter but rather to accept VMA flags parameters, which allows us to
    use this function early in mmap_region() in a subsequent commit.
    
    While we're here, we refactor the function to be more readable and add
    some additional documentation.
    
    Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.1730224667.git.lorenzo.stoakes@oracle.com
    Fixes: deb0f65 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
    Signed-off-by: Lorenzo Stoakes <[email protected]>
    Reported-by: Jann Horn <[email protected]>
    Reviewed-by: Liam R. Howlett <[email protected]>
    Reviewed-by: Vlastimil Babka <[email protected]>
    Reviewed-by: Jann Horn <[email protected]>
    Cc: Andreas Larsson <[email protected]>
    Cc: Catalin Marinas <[email protected]>
    Cc: David S. Miller <[email protected]>
    Cc: Helge Deller <[email protected]>
    Cc: James E.J. Bottomley <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    lorenzo-stoakes authored and akpm00 committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    0fb4a7a View commit details
    Browse the repository at this point in the history
  6. mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling

    Currently MTE is permitted in two circumstances (desiring to use MTE
    having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is
    specified, as checked by arch_calc_vm_flag_bits() and actualised by
    setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is
    shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap
    hook is activated in mmap_region().
    
    The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also
    set is the arm64 implementation of arch_validate_flags().
    
    Unfortunately, we intend to refactor mmap_region() to perform this check
    earlier, meaning that in the case of a shmem backing we will not have
    invoked shmem_mmap() yet, causing the mapping to fail spuriously.
    
    It is inappropriate to set this architecture-specific flag in general mm
    code anyway, so a sensible resolution of this issue is to instead move the
    check somewhere else.
    
    We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via
    the arch_calc_vm_flag_bits() call.
    
    This is an appropriate place to do this as we already check for the
    MAP_ANONYMOUS case here, and the shmem file case is simply a variant of
    the same idea - we permit RAM-backed memory.
    
    This requires a modification to the arch_calc_vm_flag_bits() signature to
    pass in a pointer to the struct file associated with the mapping, however
    this is not too egregious as this is only used by two architectures anyway
    - arm64 and parisc.
    
    So this patch performs this adjustment and removes the unnecessary
    assignment of VM_MTE_ALLOWED in shmem_mmap().
    
    [[email protected]: fix whitespace, per Catalin]
    Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.1730224667.git.lorenzo.stoakes@oracle.com
    Fixes: deb0f65 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
    Signed-off-by: Lorenzo Stoakes <[email protected]>
    Suggested-by: Catalin Marinas <[email protected]>
    Reported-by: Jann Horn <[email protected]>
    Reviewed-by: Catalin Marinas <[email protected]>
    Reviewed-by: Vlastimil Babka <[email protected]>
    Cc: Andreas Larsson <[email protected]>
    Cc: David S. Miller <[email protected]>
    Cc: Helge Deller <[email protected]>
    Cc: James E.J. Bottomley <[email protected]>
    Cc: Liam R. Howlett <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    lorenzo-stoakes authored and akpm00 committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    5baf8b0 View commit details
    Browse the repository at this point in the history
  7. mm: resolve faulty mmap_region() error path behaviour

    The mmap_region() function is somewhat terrifying, with spaghetti-like
    control flow and numerous means by which issues can arise and incomplete
    state, memory leaks and other unpleasantness can occur.
    
    A large amount of the complexity arises from trying to handle errors late
    in the process of mapping a VMA, which forms the basis of recently
    observed issues with resource leaks and observable inconsistent state.
    
    Taking advantage of previous patches in this series we move a number of
    checks earlier in the code, simplifying things by moving the core of the
    logic into a static internal function __mmap_region().
    
    Doing this allows us to perform a number of checks up front before we do
    any real work, and allows us to unwind the writable unmap check
    unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE
    validation unconditionally also.
    
    We move a number of things here:
    
    1. We preallocate memory for the iterator before we call the file-backed
       memory hook, allowing us to exit early and avoid having to perform
       complicated and error-prone close/free logic. We carefully free
       iterator state on both success and error paths.
    
    2. The enclosing mmap_region() function handles the mapping_map_writable()
       logic early. Previously the logic had the mapping_map_writable() at the
       point of mapping a newly allocated file-backed VMA, and a matching
       mapping_unmap_writable() on success and error paths.
    
       We now do this unconditionally if this is a file-backed, shared writable
       mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however
       doing so does not invalidate the seal check we just performed, and we in
       any case always decrement the counter in the wrapper.
    
       We perform a debug assert to ensure a driver does not attempt to do the
       opposite.
    
    3. We also move arch_validate_flags() up into the mmap_region()
       function. This is only relevant on arm64 and sparc64, and the check is
       only meaningful for SPARC with ADI enabled. We explicitly add a warning
       for this arch if a driver invalidates this check, though the code ought
       eventually to be fixed to eliminate the need for this.
    
    With all of these measures in place, we no longer need to explicitly close
    the VMA on error paths, as we place all checks which might fail prior to a
    call to any driver mmap hook.
    
    This eliminates an entire class of errors, makes the code easier to reason
    about and more robust.
    
    Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.1730224667.git.lorenzo.stoakes@oracle.com
    Fixes: deb0f65 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
    Signed-off-by: Lorenzo Stoakes <[email protected]>
    Reported-by: Jann Horn <[email protected]>
    Reviewed-by: Liam R. Howlett <[email protected]>
    Reviewed-by: Vlastimil Babka <[email protected]>
    Tested-by: Mark Brown <[email protected]>
    Cc: Andreas Larsson <[email protected]>
    Cc: Catalin Marinas <[email protected]>
    Cc: David S. Miller <[email protected]>
    Cc: Helge Deller <[email protected]>
    Cc: James E.J. Bottomley <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Peter Xu <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    lorenzo-stoakes authored and akpm00 committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    5de1950 View commit details
    Browse the repository at this point in the history
  8. net: phy: ti: add PHY_RST_AFTER_CLK_EN flag

    DP83848	datasheet (section 4.7.2) indicates that the reset pin should be
    toggled after the clocks are running. Add the PHY_RST_AFTER_CLK_EN to
    make sure that this indication is respected.
    
    In my experience not having this flag enabled would lead to, on some
    boots, the wrong MII mode being selected if the PHY was initialized on
    the bootloader and was receiving data during Linux boot.
    
    Signed-off-by: Diogo Silva <[email protected]>
    Reviewed-by: Andrew Lunn <[email protected]>
    Fixes: 34e45ad ("net: phy: dp83848: Add TI DP83848 Ethernet PHY")
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    DiogoSilva14 authored and kuba-moo committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    256748d View commit details
    Browse the repository at this point in the history
  9. mptcp: no admin perm to list endpoints

    During the switch to YNL, the command to list all endpoints has been
    accidentally restricted to users with admin permissions.
    
    It looks like there are no reasons to have this restriction which makes
    it harder for a user to quickly check if the endpoint list has been
    correctly populated by an automated tool. Best to go back to the
    previous behaviour then.
    
    mptcp_pm_gen.c has been modified using ynl-gen-c.py:
    
       $ ./tools/net/ynl/ynl-gen-c.py --mode kernel \
         --spec Documentation/netlink/specs/mptcp_pm.yaml --source \
         -o net/mptcp/mptcp_pm_gen.c
    
    The header file doesn't need to be regenerated.
    
    Fixes: 1d0507f ("net: mptcp: convert netlink from small_ops to ops")
    Cc: [email protected]
    Reviewed-by: Davide Caratti <[email protected]>
    Reviewed-by: Mat Martineau <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    matttbe authored and kuba-moo committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    cfbbd48 View commit details
    Browse the repository at this point in the history
  10. mptcp: use sock_kfree_s instead of kfree

    The local address entries on userspace_pm_local_addr_list are allocated
    by sock_kmalloc().
    
    It's then required to use sock_kfree_s() instead of kfree() to free
    these entries in order to adjust the allocated size on the sk side.
    
    Fixes: 24430f8 ("mptcp: add address into userspace pm list")
    Cc: [email protected]
    Signed-off-by: Geliang Tang <[email protected]>
    Reviewed-by: Matthieu Baerts (NGI0) <[email protected]>
    Signed-off-by: Matthieu Baerts (NGI0) <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Geliang Tang authored and kuba-moo committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    99635c9 View commit details
    Browse the repository at this point in the history
  11. Merge branch 'mptcp-pm-fix-wrong-perm-and-sock-kfree'

    Matthieu Baerts says:
    
    ====================
    mptcp: pm: fix wrong perm and sock kfree
    
    Two small fixes related to the MPTCP path-manager:
    
    - Patch 1: remove an accidental restriction to admin users to list MPTCP
      endpoints. A regression from v6.7.
    
    - Patch 2: correctly use sock_kfree_s() instead of kfree() in the
      userspace PM. A fix for another fix introduced in v6.4 and
      backportable up to v5.19.
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    kuba-moo committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    3f2f406 View commit details
    Browse the repository at this point in the history
  12. Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/gi…

    …t/tnguy/net-queue
    
    Tony Nguyen says:
    
    ====================
    Intel Wired LAN Driver Updates 2024-11-04 (ice, idpf, i40e, e1000e)
    
    For ice:
    
    Marcin adjusts ordering of calls in ice_eswitch_detach() to resolve a
    use after free issue.
    
    Mateusz corrects variable type for Flow Director queue to fix issues
    related to drop actions.
    
    For idpf:
    
    Pavan resolves issues related to reset on idpf; avoiding use of freed
    vport and correctly unrolling the mailbox task.
    
    For i40e:
    
    Aleksandr fixes a race condition involving addition and deletion of VF
    MAC filters.
    
    For e1000e:
    
    Vitaly reverts workaround for Meteor Lake causing regressions in power
    management flows.
    
    * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
      e1000e: Remove Meteor Lake SMBUS workarounds
      i40e: fix race condition by adding filter's intermediate sync state
      idpf: fix idpf_vc_core_init error path
      idpf: avoid vport access in idpf_get_link_ksettings
      ice: change q_index variable type to s16 to store -1 value
      ice: Fix use after free during unload with ports in bridge
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    kuba-moo committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    26a2beb View commit details
    Browse the repository at this point in the history
  13. KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it t…

    …o avoid spurious interrupts
    
    Running a L2 vCPU (see [1] for terminology) with LPCR_MER bit set and no
    pending interrupts results in that L2 vCPU getting an infinite flood of
    spurious interrupts. The 'if check' in kvmhv_run_single_vcpu() sets the
    LPCR_MER bit if there are pending interrupts.
    
    The spurious flood problem can be observed in 2 cases:
    1. Crashing the guest while interrupt heavy workload is running
      a. Start a L2 guest and run an interrupt heavy workload (eg: ipistorm)
      b. While the workload is running, crash the guest (make sure kdump
         is configured)
      c. Any one of the vCPUs of the guest will start getting an infinite
         flood of spurious interrupts.
    
    2. Running LTP stress tests in multiple guests at the same time
       a. Start 4 L2 guests.
       b. Start running LTP stress tests on all 4 guests at same time.
       c. In some time, any one/more of the vCPUs of any of the guests will
          start getting an infinite flood of spurious interrupts.
    
    The root cause of both the above issues is the same:
    1. A NMI is sent to a running vCPU that has LPCR_MER bit set.
    2. In the NMI path, all registers are refreshed, i.e, H_GUEST_GET_STATE
       is called for all the registers.
    3. When H_GUEST_GET_STATE is called for LPCR, the vcpu->arch.vcore->lpcr
       of that vCPU at L1 level gets updated with LPCR_MER set to 1, and this
       new value is always used whenever that vCPU runs, regardless of whether
       there was a pending interrupt.
    4. Since LPCR_MER is set, the vCPU in L2 always jumps to the external
       interrupt handler, and this cycle never ends.
    
    Fix the spurious flood by masking off the LPCR_MER bit before running a
    L2 vCPU to ensure that it is not set if there are no pending interrupts.
    
    [1] Terminology:
    1. L0 : PAPR hypervisor running in HV mode
    2. L1 : Linux guest (logical partition) running on top of L0
    3. L2 : KVM guest running on top of L1
    
    Fixes: ec0f663 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0")
    Cc: [email protected] # v6.8+
    Signed-off-by: Gautam Menghani <[email protected]>
    Signed-off-by: Madhavan Srinivasan <[email protected]>
    Gautam Menghani authored and maddy-kerneldev committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    a373830 View commit details
    Browse the repository at this point in the history
  14. platform/x86: thinkpad_acpi: Fix for ThinkPad's with ECFW showing inc…

    …orrect fan speed
    
    Fix for Thinkpad's with ECFW showing incorrect fan speed. Some models use
    decimal instead of hexadecimal for the speed stored in the EC registers.
    For example the rpm register will have 0x4200 instead of 0x1068, here
    the actual RPM is "4200" in decimal.
    
    Add a quirk to handle this.
    
    Signed-off-by: Vishnu Sankar <[email protected]>
    Suggested-by: Mark Pearson <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Reviewed-by: Hans de Goede <[email protected]>
    Signed-off-by: Hans de Goede <[email protected]>
    vishnuocv authored and jwrdegoede committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    1be765b View commit details
    Browse the repository at this point in the history
  15. arm64/sve: Discard stale CPU state when handling SVE traps

    The logic for handling SVE traps manipulates saved FPSIMD/SVE state
    incorrectly, and a race with preemption can result in a task having
    TIF_SVE set and TIF_FOREIGN_FPSTATE clear even though the live CPU state
    is stale (e.g. with SVE traps enabled). This has been observed to result
    in warnings from do_sve_acc() where SVE traps are not expected while
    TIF_SVE is set:
    
    |         if (test_and_set_thread_flag(TIF_SVE))
    |                 WARN_ON(1); /* SVE access shouldn't have trapped */
    
    Warnings of this form have been reported intermittently, e.g.
    
      https://lore.kernel.org/linux-arm-kernel/CA+G9fYtEGe_DhY2Ms7+L7NKsLYUomGsgqpdBj+QwDLeSg=JhGg@mail.gmail.com/
      https://lore.kernel.org/linux-arm-kernel/[email protected]/
    
    The race can occur when the SVE trap handler is preempted before and
    after manipulating the saved FPSIMD/SVE state, starting and ending on
    the same CPU, e.g.
    
    | void do_sve_acc(unsigned long esr, struct pt_regs *regs)
    | {
    |         // Trap on CPU 0 with TIF_SVE clear, SVE traps enabled
    |         // task->fpsimd_cpu is 0.
    |         // per_cpu_ptr(&fpsimd_last_state, 0) is task.
    |
    |         ...
    |
    |         // Preempted; migrated from CPU 0 to CPU 1.
    |         // TIF_FOREIGN_FPSTATE is set.
    |
    |         get_cpu_fpsimd_context();
    |
    |         if (test_and_set_thread_flag(TIF_SVE))
    |                 WARN_ON(1); /* SVE access shouldn't have trapped */
    |
    |         sve_init_regs() {
    |                 if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) {
    |                         ...
    |                 } else {
    |                         fpsimd_to_sve(current);
    |                         current->thread.fp_type = FP_STATE_SVE;
    |                 }
    |         }
    |
    |         put_cpu_fpsimd_context();
    |
    |         // Preempted; migrated from CPU 1 to CPU 0.
    |         // task->fpsimd_cpu is still 0
    |         // If per_cpu_ptr(&fpsimd_last_state, 0) is still task then:
    |         // - Stale HW state is reused (with SVE traps enabled)
    |         // - TIF_FOREIGN_FPSTATE is cleared
    |         // - A return to userspace skips HW state restore
    | }
    
    Fix the case where the state is not live and TIF_FOREIGN_FPSTATE is set
    by calling fpsimd_flush_task_state() to detach from the saved CPU
    state. This ensures that a subsequent context switch will not reuse the
    stale CPU state, and will instead set TIF_FOREIGN_FPSTATE, forcing the
    new state to be reloaded from memory prior to a return to userspace.
    
    Fixes: cccb78c ("arm64/sve: Rework SVE access trap to convert state in registers")
    Reported-by: Mark Rutland <[email protected]>
    Signed-off-by: Mark Brown <[email protected]>
    Cc: [email protected]
    Reviewed-by: Mark Rutland <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Will Deacon <[email protected]>
    broonie authored and willdeacon committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    751ecf6 View commit details
    Browse the repository at this point in the history
  16. USB: serial: qcserial: add support for Sierra Wireless EM86xx

    Add support for Sierra Wireless EM86xx with USB-id 0x1199:0x90e5 and
    0x1199:0x90e4.
    
    0x1199:0x90e5
    T:  Bus=03 Lev=01 Prnt=01 Port=05 Cnt=01 Dev#= 14 Spd=480  MxCh= 0
    D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
    P:  Vendor=1199 ProdID=90e5 Rev= 5.15
    S:  Manufacturer=Sierra Wireless, Incorporated
    S:  Product=Semtech EM8695 Mobile Broadband Adapter
    S:  SerialNumber=004403161882339
    C:* #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA
    A:  FirstIf#=12 IfCount= 2 Cls=02(comm.) Sub=0e Prot=00
    I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=qcserial
    E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
    E:  Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=40 Driver=qcserial
    E:  Ad=84(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
    E:  Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    I:* If#= 4 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none)
    E:  Ad=85(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
    I:* If#=12 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0e Prot=00 Driver=cdc_mbim
    E:  Ad=87(I) Atr=03(Int.) MxPS=  64 Ivl=32ms
    I:  If#=13 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
    I:* If#=13 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim
    E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    
    0x1199:0x90e4
    T:  Bus=03 Lev=01 Prnt=01 Port=05 Cnt=01 Dev#= 16 Spd=480  MxCh= 0
    D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
    P:  Vendor=1199 ProdID=90e4 Rev= 0.00
    S:  Manufacturer=Sierra Wireless, Incorporated
    S:  SerialNumber=004403161882339
    C:* #Ifs= 1 Cfg#= 1 Atr=a0 MxPwr=  2mA
    I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=10 Driver=qcserial
    E:  Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    E:  Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
    
    Signed-off-by: Jack Wu <[email protected]>
    Cc: [email protected]
    Signed-off-by: Johan Hovold <[email protected]>
    JackBBWu authored and jhovold committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    25eb47e View commit details
    Browse the repository at this point in the history
  17. ASoC: amd: yc: fix internal mic on Xiaomi Book Pro 14 2022

    Xiaomi Book Pro 14 2022 (MIA2210-AD) requires a quirk entry for its
    internal microphone to be enabled.
    
    This is likely due to similar reasons as seen previously on Redmi Book
    14/15 Pro 2022 models (since they likely came with similar firmware):
    
    - commit dcff8b7 ("ASoC: amd: yc: Add Xiaomi Redmi Book Pro 15 2022
      into DMI table")
    - commit c1dd6bf ("ASoC: amd: yc: Add Xiaomi Redmi Book Pro 14 2022
      into DMI table")
    
    A quirk would likely be needed for Xiaomi Book Pro 15 2022 models, too.
    However, I do not have such device on hand so I will leave it for now.
    
    Signed-off-by: Mingcong Bai <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    MingcongBai authored and broonie committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    de156f3 View commit details
    Browse the repository at this point in the history
  18. Merge tag 'hid-for-linus-20241105' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/hid/hid
    
    Pull HID fix from Jiri Kosina:
    
     - report buffer sanitization fix for HID core (Jiri Kosina)
    
    * tag 'hid-for-linus-20241105' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
      HID: core: zero-initialize the report buffer
    torvalds committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    0951fed View commit details
    Browse the repository at this point in the history
  19. Merge tag 'for-6.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux…

    …/kernel/git/device-mapper/linux-dm
    
    Pull device mapper fixes from Mikulas Patocka:
    
     - fix memory safety bugs in dm-cache
    
     - fix restart/panic logic in dm-verity
    
     - fix 32-bit unsigned integer overflow in dm-unstriped
    
     - fix a device mapper crash if blk_alloc_disk fails
    
    * tag 'for-6.12/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
      dm cache: fix potential out-of-bounds access on the first resume
      dm cache: optimize dirty bit checking with find_next_bit when resizing
      dm cache: fix out-of-bounds access to the dirty bitset when resizing
      dm cache: fix flushing uninitialized delayed_work on cache_ctr error
      dm cache: correct the number of origin blocks to match the target length
      dm-verity: don't crash if panic_on_corruption is not selected
      dm-unstriped: cast an operand to sector_t to prevent potential uint32_t overflow
      dm: fix a crash if blk_alloc_disk fails
    torvalds committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    9e23acf View commit details
    Browse the repository at this point in the history
  20. Merge tag 'platform-drivers-x86-v6.12-4' of git://git.kernel.org/pub/…

    …scm/linux/kernel/git/pdx86/platform-drivers-x86
    
    Pull x86 platform driver fixes from Hans de Goede:
    
     - AMD PMF: Add new hardware id
    
     - AMD PMC: Fix crash when loaded with enable_stb=1 on devices without STB
    
     - Dell: Add Alienware hwid for Alienware systems with Dell WMI interface
    
     - thinkpad_acpi: Quirk to fix wrong fan speed readings on L480
    
     - New hotkey mappings for Dell and Lenovo laptops
    
    * tag 'platform-drivers-x86-v6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
      platform/x86: thinkpad_acpi: Fix for ThinkPad's with ECFW showing incorrect fan speed
      platform/x86: ideapad-laptop: add missing Ideapad Pro 5 fn keys
      platform/x86: dell-wmi-base: Handle META key Lock/Unlock events
      platform/x86: dell-smbios-base: Extends support to Alienware products
      platform/x86/amd/pmc: Detect when STB is not available
      platform/x86/amd/pmf: Add SMU metrics table support for 1Ah family 60h model
    torvalds committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    b226d01 View commit details
    Browse the repository at this point in the history
  21. Merge tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/trace/linux-trace
    
    Pull tracefs fixes from Steven Rostedt:
     "Fix tracefs mount options.
    
      Commit 78ff640 ("vfs: Convert tracefs to use the new mount API")
      broke the gid setting when set by fstab or other mount utility. It is
      ignored when it is set. Fix the code so that it recognises the option
      again and will honor the settings on mount at boot up.
    
      Update the internal documentation and create a selftest to make sure
      it doesn't break again in the future"
    
    * tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
      tracing/selftests: Add tracefs mount options test
      tracing: Document tracefs gid mount option
      tracing: Fix tracefs mount options
    torvalds committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    7758b20 View commit details
    Browse the repository at this point in the history
  22. Merge tag 'keys-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/…

    …kernel/git/jarkko/linux-tpmdd
    
    Pull keys fixes from Jarkko Sakkinen:
     "A couple of fixes for keys and trusted keys"
    
    * tag 'keys-next-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
      KEYS: trusted: dcp: fix NULL dereference in AEAD crypto operation
      security/keys: fix slab-out-of-bounds in key_task_permission
    torvalds committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    f43b156 View commit details
    Browse the repository at this point in the history
  23. ACPI: processor: Move arch_init_invariance_cppc() call later

    arch_init_invariance_cppc() is called at the end of
    acpi_cppc_processor_probe() in order to configure frequency invariance
    based upon the values from _CPC.
    
    This however doesn't work on AMD CPPC shared memory designs that have
    AMD preferred cores enabled because _CPC needs to be analyzed from all
    cores to judge if preferred cores are enabled.
    
    This issue manifests to users as a warning since commit 21fb59a
    ("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"):
    ```
    Could not retrieve highest performance (-19)
    ```
    
    However the warning isn't the cause of this, it was actually
    commit 279f838 ("x86/amd: Detect preferred cores in
    amd_get_boost_ratio_numerator()") which exposed the issue.
    
    To fix this problem, change arch_init_invariance_cppc() into a new weak
    symbol that is called at the end of acpi_processor_driver_init().
    Each architecture that supports it can declare the symbol to override
    the weak one.
    
    Define it for x86, in arch/x86/kernel/acpi/cppc.c, and for all of the
    architectures using the generic arch_topology.c code.
    
    Fixes: 279f838 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()")
    Reported-by: Ivan Shapovalov <[email protected]>
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219431
    Tested-by: Oleksandr Natalenko <[email protected]>
    Signed-off-by: Mario Limonciello <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    [ rjw: Changelog edit ]
    Signed-off-by: Rafael J. Wysocki <[email protected]>
    superm1 authored and rafaeljw committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    b79276d View commit details
    Browse the repository at this point in the history
  24. ASoC: SOF: amd: Fix for incorrect DMA ch status register offset

    DMA ch status register offset change in acp7.0 platform
    
    Incorrect DMA channel status register offset check lead to
    firmware boot failure.
    
    [   14.432497] snd_sof_amd_acp70 0000:c4:00.5: ------------[ DSP dump start ]------------
    [   14.432533] snd_sof_amd_acp70 0000:c4:00.5: Firmware boot failure due to timeout
    [   14.432549] snd_sof_amd_acp70 0000:c4:00.5: fw_state: SOF_FW_BOOT_IN_PROGRESS (3)
    [   14.432610] snd_sof_amd_acp70 0000:c4:00.5: invalid header size 0x71c41000. FW oops is bogus
    [   14.432626] snd_sof_amd_acp70 0000:c4:00.5: unexpected fault 0x71c40000 trace 0x71c40000
    [   14.432642] snd_sof_amd_acp70 0000:c4:00.5: ------------[ DSP dump end ]------------
    [   14.432657] snd_sof_amd_acp70 0000:c4:00.5: error: failed to boot DSP firmware -5
    [   14.432672] snd_sof_amd_acp70 0000:c4:00.5: fw_state change: 3 -> 4
    [   14.433260] dmic-codec dmic-codec: ASoC: Unregistered DAI 'dmic-hifi'
    [   14.433319] snd_sof_amd_acp70 0000:c4:00.5: fw_state change: 4 -> 0
    [   14.433358] snd_sof_amd_acp70 0000:c4:00.5: error: sof_probe_work failed err: -5
    
    Update correct register offset for DMA ch status register.
    
    Fixes: 490be7b ("ASoC: SOF: amd: add support for acp7.0 based platform")
    
    Signed-off-by: Venkata Prasad Potturu <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Venkata-Prasad-Potturu authored and broonie committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    94debe5 View commit details
    Browse the repository at this point in the history
  25. media: dvbdev: fix the logic when DVB_DYNAMIC_MINORS is not set

    When CONFIG_DVB_DYNAMIC_MINORS, ret is not initialized, and a
    semaphore is left at the wrong state, in case of errors.
    
    Make the code simpler and avoid mistakes by having just one error
    check logic used weather DVB_DYNAMIC_MINORS is used or not.
    
    Reported-by: kernel test robot <[email protected]>
    Reported-by: Dan Carpenter <[email protected]>
    Closes: https://lore.kernel.org/r/[email protected]/
    Signed-off-by: Mauro Carvalho Chehab <[email protected]>
    Link: https://lore.kernel.org/r/9e067488d8935b8cf00959764a1fa5de85d65725.1730926254.git.mchehab+huawei@kernel.org
    mchehab committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    a4aebaf View commit details
    Browse the repository at this point in the history
  26. Merge tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/l…

    …inux-nfs
    
    Pull NFS client fixes from Anna Schumaker:
     "These are mostly fixes that came up during the nfs bakeathon the other
      week.
    
      Stable Fixes:
       - Fix KMSAN warning in decode_getfattr_attrs()
    
      Other Bugfixes:
       - Handle -ENOTCONN in xs_tcp_setup_socked()
       - NFSv3: only use NFS timeout for MOUNT when protocols are compatible
       - Fix attribute delegation behavior on exclusive create and a/mtime
         changes
       - Fix localio to cope with racing nfs_local_probe()
       - Avoid i_lock contention in fs_clear_invalid_mapping()"
    
    * tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
      nfs: avoid i_lock contention in nfs_clear_invalid_mapping
      nfs_common: fix localio to cope with racing nfs_local_probe()
      NFS: Further fixes to attribute delegation a/mtime changes
      NFS: Fix attribute delegation behaviour on exclusive create
      nfs: Fix KMSAN warning in decode_getfattr_attrs()
      NFSv3: only use NFS timeout for MOUNT when protocols are compatible
      sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()
    torvalds committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    ff7afae View commit details
    Browse the repository at this point in the history
  27. irqchip/gic-v3: Force propagation of the active state with a read-back

    Christoffer reports that on some implementations, writing to
    GICR_ISACTIVER0 (and similar GICD registers) can race badly with a guest
    issuing a deactivation of that interrupt via the system register interface.
    
    There are multiple reasons to this:
    
     - this uses an early write-acknoledgement memory type (nGnRE), meaning
       that the write may only have made it as far as some interconnect
       by the time the store is considered "done"
    
     - the GIC itself is allowed to buffer the write until it decides to
       take it into account (as long as it is in finite time)
    
    The effects are that the activation may not have taken effect by the time
    the kernel enters the guest, forcing an immediate exit, or that a guest
    deactivation occurs before the interrupt is active, doing nothing.
    
    In order to guarantee that the write to the ISACTIVER register has taken
    effect, read back from it, forcing the interconnect to propagate the write,
    and the GIC to process the write before returning the read.
    
    Reported-by: Christoffer Dall <[email protected]>
    Signed-off-by: Marc Zyngier <[email protected]>
    Signed-off-by: Thomas Gleixner <[email protected]>
    Acked-by: Christoffer Dall <[email protected]>
    Cc: [email protected]
    Link: https://lore.kernel.org/all/[email protected]
    Marc Zyngier authored and KAGA-KOKO committed Nov 6, 2024
    Configuration menu
    Copy the full SHA
    464cb98 View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2024

  1. btrfs: fix per-subvolume RO/RW flags with new mount API

    [BUG]
    With util-linux 2.40.2, the 'mount' utility is already utilizing the new
    mount API. e.g:
    
      # strace  mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test/
      ...
      fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0
      fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0
      fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0
      fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
      fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
      mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0
      move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0
    
    But this leads to a new problem, that per-subvolume RO/RW mount no
    longer works, if the initial mount is RO:
    
      # mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test
      # mount -o rw,subvol=subv2 /dev/test/scratch1  /mnt/scratch
      # mount | grep mnt
      /dev/mapper/test-scratch1 on /mnt/test type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=256,subvol=/subv1)
      /dev/mapper/test-scratch1 on /mnt/scratch type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=257,subvol=/subv2)
      # touch /mnt/scratch/foobar
      touch: cannot touch '/mnt/scratch/foobar': Read-only file system
    
    This is a common use cases on distros.
    
    [CAUSE]
    We have a workaround for remount to handle the RO->RW change, but if the
    mount is using the new mount API, we do not do that, and rely on the
    mount tool NOT to set the ro flag.
    
    But that's not how the mount tool is doing for the new API:
    
      fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0
      fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0
      fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0       <<<< Setting RO flag for super block
      fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
      fsmount(3, FSMOUNT_CLOEXEC, 0)          = 4
      mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0
      move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0
    
    This means we will set the super block RO at the first mount.
    
    Later RW mount will not try to reconfigure the fs to RW because the
    mount tool is already using the new API.
    
    This totally breaks the per-subvolume RO/RW mount behavior.
    
    [FIX]
    Do not skip the reconfiguration even if using the new API.  The old
    comments are just expecting any mount tool to properly skip the RO flag
    set even if we specify "ro", which is not the reality.
    
    Update the comments regarding the backward compatibility on the kernel
    level so it works with old and new mount utilities.
    
    CC: [email protected] # 6.8+
    Fixes: f044b31 ("btrfs: handle the ro->rw transition for mounting different subvolumes")
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    cda7163 View commit details
    Browse the repository at this point in the history
  2. btrfs: reinitialize delayed ref list after deleting it from the list

    At insert_delayed_ref() if we need to update the action of an existing
    ref to BTRFS_DROP_DELAYED_REF, we delete the ref from its ref head's
    ref_add_list using list_del(), which leaves the ref's add_list member
    not reinitialized, as list_del() sets the next and prev members of the
    list to LIST_POISON1 and LIST_POISON2, respectively.
    
    If later we end up calling drop_delayed_ref() against the ref, which can
    happen during merging or when destroying delayed refs due to a transaction
    abort, we can trigger a crash since at drop_delayed_ref() we call
    list_empty() against the ref's add_list, which returns false since
    the list was not reinitialized after the list_del() and as a consequence
    we call list_del() again at drop_delayed_ref(). This results in an
    invalid list access since the next and prev members are set to poison
    pointers, resulting in a splat if CONFIG_LIST_HARDENED and
    CONFIG_DEBUG_LIST are set or invalid poison pointer dereferences
    otherwise.
    
    So fix this by deleting from the list with list_del_init() instead.
    
    Fixes: 1d57ee9 ("btrfs: improve delayed refs iterations")
    CC: [email protected] # 4.19+
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    c9a75ec View commit details
    Browse the repository at this point in the history
  3. btrfs: fix the length of reserved qgroup to free

    The dealloc flag may be cleared and the extent won't reach the disk in
    cow_file_range when errors path. The reserved qgroup space is freed in
    commit 30479f3 ("btrfs: fix qgroup reserve leaks in
    cow_file_range"). However, the length of untouched region to free needs
    to be adjusted with the correct remaining region size.
    
    Fixes: 30479f3 ("btrfs: fix qgroup reserve leaks in cow_file_range")
    CC: [email protected] # 6.11+
    Reviewed-by: Qu Wenruo <[email protected]>
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Haisu Wang <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Haisu Wang authored and kdave committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    2b084d8 View commit details
    Browse the repository at this point in the history
  4. net: vertexcom: mse102x: Fix possible double free of TX skb

    The scope of the TX skb is wider than just mse102x_tx_frame_spi(),
    so in case the TX skb room needs to be expanded, we should free the
    the temporary skb instead of the original skb. Otherwise the original
    TX skb pointer would be freed again in mse102x_tx_work(), which leads
    to crashes:
    
      Internal error: Oops: 0000000096000004 [#2] PREEMPT SMP
      CPU: 0 PID: 712 Comm: kworker/0:1 Tainted: G      D            6.6.23
      Hardware name: chargebyte Charge SOM DC-ONE (DT)
      Workqueue: events mse102x_tx_work [mse102x]
      pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : skb_release_data+0xb8/0x1d8
      lr : skb_release_data+0x1ac/0x1d8
      sp : ffff8000819a3cc0
      x29: ffff8000819a3cc0 x28: ffff0000046daa60 x27: ffff0000057f2dc0
      x26: ffff000005386c00 x25: 0000000000000002 x24: 00000000ffffffff
      x23: 0000000000000000 x22: 0000000000000001 x21: ffff0000057f2e50
      x20: 0000000000000006 x19: 0000000000000000 x18: ffff00003fdacfcc
      x17: e69ad452d0c49def x16: 84a005feff870102 x15: 0000000000000000
      x14: 000000000000024a x13: 0000000000000002 x12: 0000000000000000
      x11: 0000000000000400 x10: 0000000000000930 x9 : ffff00003fd913e8
      x8 : fffffc00001bc008
      x7 : 0000000000000000 x6 : 0000000000000008
      x5 : ffff00003fd91340 x4 : 0000000000000000 x3 : 0000000000000009
      x2 : 00000000fffffffe x1 : 0000000000000000 x0 : 0000000000000000
      Call trace:
       skb_release_data+0xb8/0x1d8
       kfree_skb_reason+0x48/0xb0
       mse102x_tx_work+0x164/0x35c [mse102x]
       process_one_work+0x138/0x260
       worker_thread+0x32c/0x438
       kthread+0x118/0x11c
       ret_from_fork+0x10/0x20
      Code: aa1303e0 97fffab6 72001c1f 54000141 (f9400660)
    
    Cc: [email protected]
    Fixes: 2f207cb ("net: vertexcom: Add MSE102x SPI support")
    Signed-off-by: Stefan Wahren <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    lategoodbye authored and kuba-moo committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    1f26339 View commit details
    Browse the repository at this point in the history
  5. net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case

    Commit a23aa04 ("net: stmmac: ethtool: Fixed calltrace caused by
    unbalanced disable_irq_wake calls") introduced checks to prevent
    unbalanced enable and disable IRQ wake calls. However it only
    initialized the auxiliary variable on one of the paths,
    stmmac_request_irq_multi_msi(), missing the other,
    stmmac_request_irq_single().
    
    Add the same initialization on stmmac_request_irq_single() to prevent
    "Unbalanced IRQ <x> wake disable" warnings from being printed the first
    time disable_irq_wake() is called on platforms that run on that code
    path.
    
    Fixes: a23aa04 ("net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake calls")
    Signed-off-by: Nícolas F. R. A. Prado <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Link: https://patch.msgid.link/20241101-stmmac-unbalanced-wake-single-fix-v1-1-5952524c97f0@collabora.com
    Signed-off-by: Paolo Abeni <[email protected]>
    nfraprado authored and Paolo Abeni committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    25d7070 View commit details
    Browse the repository at this point in the history
  6. arm64: smccc: Remove broken support for SMCCCv1.3 SVE discard hint

    SMCCCv1.3 added a hint bit which callers can set in an SMCCC function ID
    (AKA "FID") to indicate that it is acceptable for the SMCCC
    implementation to discard SVE and/or SME state over a specific SMCCC
    call. The kernel support for using this hint is broken and SMCCC calls
    may clobber the SVE and/or SME state of arbitrary tasks, though FPSIMD
    state is unaffected.
    
    The kernel support is intended to use the hint when there is no SVE or
    SME state to save, and to do this it checks whether TIF_FOREIGN_FPSTATE
    is set or TIF_SVE is clear in assembly code:
    
    |        ldr     <flags>, [<current_task>, #TSK_TI_FLAGS]
    |        tbnz    <flags>, #TIF_FOREIGN_FPSTATE, 1f   // Any live FP state?
    |        tbnz    <flags>, #TIF_SVE, 2f               // Does that state include SVE?
    |
    | 1:     orr     <fid>, <fid>, ARM_SMCCC_1_3_SVE_HINT
    | 2:
    |        << SMCCC call using FID >>
    
    This is not safe as-is:
    
    (1) SMCCC calls can be made in a preemptible context and preemption can
        result in TIF_FOREIGN_FPSTATE being set or cleared at arbitrary
        points in time. Thus checking for TIF_FOREIGN_FPSTATE provides no
        guarantee.
    
    (2) TIF_FOREIGN_FPSTATE only indicates that the live FP/SVE/SME state in
        the CPU does not belong to the current task, and does not indicate
        that clobbering this state is acceptable.
    
        When the live CPU state is clobbered it is necessary to update
        fpsimd_last_state.st to ensure that a subsequent context switch will
        reload FP/SVE/SME state from memory rather than consuming the
        clobbered state. This and the SMCCC call itself must happen in a
        critical section with preemption disabled to avoid races.
    
    (3) Live SVE/SME state can exist with TIF_SVE clear (e.g. with only
        TIF_SME set), and checking TIF_SVE alone is insufficient.
    
    Remove the broken support for the SMCCCv1.3 SVE saving hint. This is
    effectively a revert of commits:
    
    * cfa7ff9 ("arm64: smccc: Support SMCCC v1.3 SVE register saving hint")
    * a7c3acc ("arm64: smccc: Save lr before calling __arm_smccc_sve_check()")
    
    ... leaving behind the ARM_SMCCC_VERSION_1_3 and ARM_SMCCC_1_3_SVE_HINT
    definitions, since these are simply definitions from the SMCCC
    specification, and the latter is used in KVM via ARM_SMCCC_CALL_HINTS.
    
    If we want to bring this back in future, we'll probably want to handle
    this logic in C where we can use all the usual FPSIMD/SVE/SME helper
    functions, and that'll likely require some rework of the SMCCC code
    and/or its callers.
    
    Fixes: cfa7ff9 ("arm64: smccc: Support SMCCC v1.3 SVE register saving hint")
    Signed-off-by: Mark Rutland <[email protected]>
    Cc: Ard Biesheuvel <[email protected]>
    Cc: Catalin Marinas <[email protected]>
    Cc: Marc Zyngier <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: [email protected]
    Reviewed-by: Mark Brown <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Will Deacon <[email protected]>
    Mark Rutland authored and willdeacon committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    8c462d5 View commit details
    Browse the repository at this point in the history
  7. arm64: Kconfig: Make SME depend on BROKEN for now

    Although support for SME was merged in v5.19, we've since uncovered a
    number of issues with the implementation, including issues which might
    corrupt the FPSIMD/SVE/SME state of arbitrary tasks. While there are
    patches to address some of these issues, ongoing review has highlighted
    additional functional problems, and more time is necessary to analyse
    and fix these.
    
    For now, mark SME as BROKEN in the hope that we can fix things properly
    in the near future. As SME is an OPTIONAL part of ARMv9.2+, and there is
    very little extant hardware, this should not adversely affect the vast
    majority of users.
    
    Signed-off-by: Mark Rutland <[email protected]>
    Cc: Ard Biesheuvel <[email protected]>
    Cc: Catalin Marinas <[email protected]>
    Cc: Marc Zyngier <[email protected]>
    Cc: Mark Brown <[email protected]>
    Cc: Will Deacon <[email protected]>
    Cc: [email protected] # 5.19
    Acked-by: Catalin Marinas <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Will Deacon <[email protected]>
    Mark Rutland authored and willdeacon committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    81235ae View commit details
    Browse the repository at this point in the history
  8. netfilter: nf_tables: wait for rcu grace period on net_device removal

    8c873e2 ("netfilter: core: free hooks with call_rcu") removed
    synchronize_net() call when unregistering basechain hook, however,
    net_device removal event handler for the NFPROTO_NETDEV was not updated
    to wait for RCU grace period.
    
    Note that 835b803 ("netfilter: nf_tables_netdev: unregister hooks
    on net_device removal") does not remove basechain rules on device
    removal, I was hinted to remove rules on net_device removal later, see
    5ebe0b0 ("netfilter: nf_tables: destroy basechain and rules on
    netdevice removal").
    
    Although NETDEV_UNREGISTER event is guaranteed to be handled after
    synchronize_net() call, this path needs to wait for rcu grace period via
    rcu callback to release basechain hooks if netns is alive because an
    ongoing netlink dump could be in progress (sockets hold a reference on
    the netns).
    
    Note that nf_tables_pre_exit_net() unregisters and releases basechain
    hooks but it is possible to see NETDEV_UNREGISTER at a later stage in
    the netns exit path, eg. veth peer device in another netns:
    
     cleanup_net()
      default_device_exit_batch()
       unregister_netdevice_many_notify()
        notifier_call_chain()
         nf_tables_netdev_event()
          __nft_release_basechain()
    
    In this particular case, same rule of thumb applies: if netns is alive,
    then wait for rcu grace period because netlink dump in the other netns
    could be in progress. Otherwise, if the other netns is going away then
    no netlink dump can be in progress and basechain hooks can be released
    inmediately.
    
    While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain
    validation, which should not ever happen.
    
    Fixes: 835b803 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal")
    Signed-off-by: Pablo Neira Ayuso <[email protected]>
    ummakynes committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    c03d278 View commit details
    Browse the repository at this point in the history
  9. virtio_net: Support dynamic rss indirection table size

    When reading/writing virtio_net_ctrl_rss, we get the indirection table
    size from vi->rss_indir_table_size, which is initialized in
    virtnet_probe(). However, the actual size of indirection_table was set
    as VIRTIO_NET_RSS_MAX_TABLE_LEN=128. This collision may cause issues if
    the vi->rss_indir_table_size exceeds 128.
    
    This patch instead uses dynamic indirection table, allocated with
    vi->rss after vi->rss_indir_table_size initialized. And free it in
    virtnet_remove().
    
    In virtnet_commit_rss_command(), sgs for rss is initialized differently
    with hash_report. So indirection_table is not used if !vi->has_rss, and
    then we don't need to alloc indirection_table for hash_report only uses.
    
    Fixes: c7114b1 ("drivers/net/virtio_net: Added basic RSS support.")
    Signed-off-by: Philo Lu <[email protected]>
    Signed-off-by: Xuan Zhuo <[email protected]>
    Acked-by: Joe Damato <[email protected]>
    Acked-by: Michael S. Tsirkin <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Philo Lu authored and Paolo Abeni committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    86a48a0 View commit details
    Browse the repository at this point in the history
  10. virtio_net: Add hash_key_length check

    Add hash_key_length check in virtnet_probe() to avoid possible out of
    bound errors when setting/reading the hash key.
    
    Fixes: c7114b1 ("drivers/net/virtio_net: Added basic RSS support.")
    Signed-off-by: Philo Lu <[email protected]>
    Signed-off-by: Xuan Zhuo <[email protected]>
    Acked-by: Joe Damato <[email protected]>
    Acked-by: Michael S. Tsirkin <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Philo Lu authored and Paolo Abeni committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    3f7d9c1 View commit details
    Browse the repository at this point in the history
  11. virtio_net: Sync rss config to device when virtnet_probe

    During virtnet_probe, default rss configuration is initialized, but was
    not committed to the device. This patch fix this by sending rss command
    after device ready in virtnet_probe. Otherwise, the actual rss
    configuration used by device can be different with that read by user
    from driver, which may confuse the user.
    
    If the command committing fails, driver rss will be disabled.
    
    Fixes: c7114b1 ("drivers/net/virtio_net: Added basic RSS support.")
    Signed-off-by: Philo Lu <[email protected]>
    Signed-off-by: Xuan Zhuo <[email protected]>
    Acked-by: Joe Damato <[email protected]>
    Acked-by: Michael S. Tsirkin <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Philo Lu authored and Paolo Abeni committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    dc749b7 View commit details
    Browse the repository at this point in the history
  12. virtio_net: Update rss when set queue

    RSS configuration should be updated with queue number. In particular, it
    should be updated when (1) rss enabled and (2) default rss configuration
    is used without user modification.
    
    During rss command processing, device updates queue_pairs using
    rss.max_tx_vq. That is, the device updates queue_pairs together with
    rss, so we can skip the sperate queue_pairs update
    (VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly.
    
    Also remove the `vi->has_rss ?` check when setting vi->rss.max_tx_vq,
    because this is not used in the other hash_report case.
    
    Fixes: c7114b1 ("drivers/net/virtio_net: Added basic RSS support.")
    Signed-off-by: Philo Lu <[email protected]>
    Signed-off-by: Xuan Zhuo <[email protected]>
    Acked-by: Michael S. Tsirkin <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Philo Lu authored and Paolo Abeni committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    50bfcae View commit details
    Browse the repository at this point in the history
  13. Merge branch 'virtio_net-make-rss-interact-properly-with-queue-number'

    Philo Lu says:
    
    ====================
    virtio_net: Make RSS interact properly with queue number
    
    With this patch set, RSS updates with queue_pairs changing:
    - When virtnet_probe, init default rss and commit
    - When queue_pairs changes _without_ user rss configuration, update rss
      with the new queue number
    - When queue_pairs changes _with_ user rss configuration, keep rss as user
      configured
    
    Patch 1 and 2 fix possible out of bound errors for indir_table and key.
    Patch 3 and 4 add RSS update in probe() and set_queues().
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Paolo Abeni committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    5d182f7 View commit details
    Browse the repository at this point in the history
  14. media: videobuf2-core: copy vb planes unconditionally

    Copy the relevant data from userspace to the vb->planes unconditionally
    as it's possible some of the fields may have changed after the buffer
    has been validated.
    
    Keep the dma_buf_put(planes[plane].dbuf) calls in the first
    `if (!reacquired)` case, in order to be close to the plane validation code
    where the buffers were got in the first place.
    
    Cc: [email protected]
    Fixes: 95af7c0 ("media: videobuf2-core: release all planes first in __prepare_dmabuf()")
    Signed-off-by: Tudor Ambarus <[email protected]>
    Tested-by: Will McVicker <[email protected]>
    Acked-by: Tomasz Figa <[email protected]>
    Signed-off-by: Hans Verkuil <[email protected]>
    ambarus authored and hverkuil committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    702a47c View commit details
    Browse the repository at this point in the history
  15. net: arc: fix the device for dma_map_single/dma_unmap_single

    The ndev->dev and pdev->dev aren't the same device, use ndev->dev.parent
    which has dma_mask, ndev->dev.parent is just pdev->dev.
    Or it would cause the following issue:
    
    [   39.933526] ------------[ cut here ]------------
    [   39.938414] WARNING: CPU: 1 PID: 501 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x90/0x1f8
    
    Fixes: f959dcd ("dma-direct: Fix potential NULL pointer dereference")
    Signed-off-by: David Wu <[email protected]>
    Signed-off-by: Johan Jonker <[email protected]>
    Signed-off-by: Andy Yan <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Johan Jonker authored and Paolo Abeni committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    71803c1 View commit details
    Browse the repository at this point in the history
  16. net: arc: rockchip: fix emac mdio node support

    The binding emac_rockchip.txt is converted to YAML.
    Changed against the original binding is an added MDIO subnode.
    This make the driver failed to find the PHY, and given the 'mdio
    has invalid PHY address' it is probably looking in the wrong node.
    Fix emac_mdio.c so that it can handle both old and new
    device trees.
    
    Fixes: 1dabb74 ("ARM: dts: rockchip: restyle emac nodes")
    Signed-off-by: Johan Jonker <[email protected]>
    Tested-by: Andy Yan <[email protected]>
    Link: https://lore.kernel.org/r/[email protected]
    Signed-off-by: Andy Yan <[email protected]>
    Reviewed-by: Andrew Lunn <[email protected]>
    Signed-off-by: Paolo Abeni <[email protected]>
    Johan Jonker authored and Paolo Abeni committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    0a1c7a7 View commit details
    Browse the repository at this point in the history
  17. Merge branch 'fix-the-arc-emac-driver'

    Andy Yan says:
    
    ====================
    Fix the arc emac driver
    
    The arc emac driver was broken for a long time,
    The first broken happens when a dma releated fix introduced in Linux 5.10.
    The second broken happens when a emac device tree node restyle introduced
    in Linux 6.1.
    
    These two patches are try to make the arc emac work again.
    
    Changes in v2:
    - Add cover letter.
    - Add fix tag.
    - Add more detail explaination.
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Paolo Abeni <[email protected]>
    Paolo Abeni committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    5f897f3 View commit details
    Browse the repository at this point in the history
  18. ASoC: amd: yc: Support dmic on another model of Lenovo Thinkpad E14 G…

    …en 6
    
    Another model of Thinkpad E14 Gen 6 (21M4)
    needs a quirk entry for the dmic to be detected.
    
    Signed-off-by: Markus Petri <[email protected]>
    Link: https://patch.msgid.link/20241107094020.1050935-1-mp@localhost
    Signed-off-by: Mark Brown <[email protected]>
    Markus Petri authored and broonie committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    8c21e40 View commit details
    Browse the repository at this point in the history
  19. ASoC: stm: Prevent potential division by zero in stm32_sai_mclk_round…

    …_rate()
    
    This patch checks if div is less than or equal to zero (div <= 0). If
    div is zero or negative, the function returns -EINVAL, ensuring the
    division operation (*prate / div) is safe to perform.
    
    Signed-off-by: Luo Yifan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Luo Yifan authored and broonie committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    63c1c87 View commit details
    Browse the repository at this point in the history
  20. ASoC: stm: Prevent potential division by zero in stm32_sai_get_clk_div()

    This patch checks if div is less than or equal to zero (div <= 0). If
    div is zero or negative, the function returns -EINVAL, ensuring the
    division operation is safe to perform.
    
    Signed-off-by: Luo Yifan <[email protected]>
    Reviewed-by: Olivier Moysan <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Luo Yifan authored and broonie committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    23569c8 View commit details
    Browse the repository at this point in the history
  21. NFSD: Fix READDIR on NFSv3 mounts of ext4 exports

    I noticed that recently, simple operations like "make" started
    failing on NFSv3 mounts of ext4 exports. Network capture shows that
    READDIRPLUS operated correctly but READDIR failed with
    NFS3ERR_INVAL. The vfs_llseek() call returned EINVAL when it is
    passed a non-zero starting directory cookie.
    
    I bisected to commit c689bdd ("nfsd: further centralize
    protocol version checks.").
    
    Turns out that nfsd3_proc_readdir() does not call fh_verify() before
    it calls nfsd_readdir(), so the new fhp->fh_64bit_cookies boolean is
    not set properly. This leaves the NFSD_MAY_64BIT_COOKIE unset when
    the directory is opened.
    
    For ext4, this causes the wrong "max file size" value to be used
    when sanity checking the incoming directory cookie (which is a seek
    offset value).
    
    The fhp->fh_64bit_cookies boolean is /always/ properly initialized
    after nfsd_open() returns. There doesn't seem to be a reason for the
    generic NFSD open helper to handle the f_mode fix-up for
    directories, so just move that to the one caller that tries to open
    an S_IFDIR with NFSD_MAY_64BIT_COOKIE.
    
    Suggested-by: NeilBrown <[email protected]>
    Fixes: c689bdd ("nfsd: further centralize protocol version checks.")
    Reviewed-by: NeilBrown <[email protected]>
    Signed-off-by: Chuck Lever <[email protected]>
    chucklever committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    bb1fb40 View commit details
    Browse the repository at this point in the history
  22. Merge tag 'thunderbolt-for-v6.12-rc7' of ssh://gitolite.kernel.org/pu…

    …b/scm/linux/kernel/git/westeri/thunderbolt into usb-linus
    
    thunderbolt: Fixes for v6.12-rc7
    
    This includes following USB4/Thunderbolt fixes for v6.12-rc7:
    
      - Fix for retimer enumeration.
      - Fix connection issue with Pluggable UD-4VPD USB4 dock.
    
    Both have been in linux-next with no reported issues.
    
    * tag 'thunderbolt-for-v6.12-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt:
      thunderbolt: Fix connection issue with Pluggable UD-4VPD dock
      thunderbolt: Add only on-board retimers when !CONFIG_USB4_DEBUGFS_MARGINING
    gregkh committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    0c08402 View commit details
    Browse the repository at this point in the history
  23. drm: panel-orientation-quirks: Make Lenovo Yoga Tab 3 X90F DMI match …

    …less strict
    
    There are 2G and 4G RAM versions of the Lenovo Yoga Tab 3 X90F and it
    turns out that the 2G version has a DMI product name of
    "CHERRYVIEW D1 PLATFORM" where as the 4G version has
    "CHERRYVIEW C0 PLATFORM". The sys-vendor + product-version check are
    unique enough that the product-name check is not necessary.
    
    Drop the product-name check so that the existing DMI match for the 4G
    RAM version also matches the 2G RAM version.
    
    Signed-off-by: Hans de Goede <[email protected]>
    Acked-by: Jani Nikula <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    jwrdegoede committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    052ef64 View commit details
    Browse the repository at this point in the history
  24. drm/panthor: Lock XArray when getting entries for the VM

    Similar to commit cac0757 ("drm/panthor: Fix race when converting
    group handle to group object") we need to use the XArray's internal
    locking when retrieving a vm pointer from there.
    
    v2: Removed part of the patch that was trying to protect fetching
    the heap pointer from XArray, as that operation is protected by
    the @pool->lock.
    
    Fixes: 647810e ("drm/panthor: Add the MMU/VM logical block")
    Reported-by: Jann Horn <[email protected]>
    Cc: [email protected]
    Signed-off-by: Liviu Dudau <[email protected]>
    Reviewed-by: Boris Brezillon <[email protected]>
    Reviewed-by: Steven Price <[email protected]>
    Signed-off-by: Steven Price <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    dliviu authored and Steven Price committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    444fa5b View commit details
    Browse the repository at this point in the history
  25. ASoC: SOF: sof-client-probes-ipc4: Set param_size extension bits

    Write the size of the optional payload of SOF_IPC4_MOD_INIT_INSTANCE
    message to extension param_size-bits.
    
    The previous IPC4 version does not set these bits that should indicate
    the size of the optional payload (struct sof_ipc4_probe_cfg). The old
    firmware side component code works well without these bits, but when
    the probes are converted to use the generic module API, this does not
    work anymore.
    
    Fixes: f562359 ("ASoC: SOF: IPC4: probes: Implement IPC4 ops for probes client device")
    Signed-off-by: Jyri Sarha <[email protected]>
    Reviewed-by: Ranjani Sridharan <[email protected]>
    Reviewed-by: Liam Girdwood <[email protected]>
    Reviewed-by: Bard Liao <[email protected]>
    Signed-off-by: Peter Ujfalusi <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Mark Brown <[email protected]>
    Jyri Sarha authored and broonie committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    48b8653 View commit details
    Browse the repository at this point in the history
  26. Merge tag 'nf-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/netfilter/nf
    
    Pablo Neira Ayuso says:
    
    ====================
    Netfilter fix for net
    
    The following series contains a Netfilter fix:
    
    1) Wait for rcu grace period after netdevice removal is reported via event.
    
    * tag 'nf-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
      netfilter: nf_tables: wait for rcu grace period on net_device removal
    ====================
    
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    kuba-moo committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    013d2c5 View commit details
    Browse the repository at this point in the history
  27. drm/panthor: Be stricter about IO mapping flags

    The current panthor_device_mmap_io() implementation has two issues:
    
    1. For mapping DRM_PANTHOR_USER_FLUSH_ID_MMIO_OFFSET,
       panthor_device_mmap_io() bails if VM_WRITE is set, but does not clear
       VM_MAYWRITE. That means userspace can use mprotect() to make the mapping
       writable later on. This is a classic Linux driver gotcha.
       I don't think this actually has any impact in practice:
       When the GPU is powered, writes to the FLUSH_ID seem to be ignored; and
       when the GPU is not powered, the dummy_latest_flush page provided by the
       driver is deliberately designed to not do any flushes, so the only thing
       writing to the dummy_latest_flush could achieve would be to make *more*
       flushes happen.
    
    2. panthor_device_mmap_io() does not block MAP_PRIVATE mappings (which are
       mappings without the VM_SHARED flag).
       MAP_PRIVATE in combination with VM_MAYWRITE indicates that the VMA has
       copy-on-write semantics, which for VM_PFNMAP are semi-supported but
       fairly cursed.
       In particular, in such a mapping, the driver can only install PTEs
       during mmap() by calling remap_pfn_range() (because remap_pfn_range()
       wants to **store the physical address of the mapped physical memory into
       the vm_pgoff of the VMA**); installing PTEs later on with a fault
       handler (as panthor does) is not supported in private mappings, and so
       if you try to fault in such a mapping, vmf_insert_pfn_prot() splats when
       it hits a BUG() check.
    
    Fix it by clearing the VM_MAYWRITE flag (userspace writing to the FLUSH_ID
    doesn't make sense) and requiring VM_SHARED (copy-on-write semantics for
    the FLUSH_ID don't make sense).
    
    Reproducers for both scenarios are in the notes of my patch on the mailing
    list; I tested that these bugs exist on a Rock 5B machine.
    
    Note that I only compile-tested the patch, I haven't tested it; I don't
    have a working kernel build setup for the test machine yet. Please test it
    before applying it.
    
    Cc: [email protected]
    Fixes: 5fe909c ("drm/panthor: Add the device logical block")
    Signed-off-by: Jann Horn <[email protected]>
    Reviewed-by: Boris Brezillon <[email protected]>
    Reviewed-by: Liviu Dudau <[email protected]>
    Reviewed-by: Steven Price <[email protected]>
    Signed-off-by: Steven Price <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    thejh authored and Steven Price committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    f432a16 View commit details
    Browse the repository at this point in the history
  28. proc/softirqs: replace seq_printf with seq_put_decimal_ull_width

    seq_printf is costy, on a system with n CPUs, reading /proc/softirqs
    would yield 10*n decimal values, and the extra cost parsing format string
    grows linearly with number of cpus. Replace seq_printf with
    seq_put_decimal_ull_width have significant performance improvement.
    On an 8CPUs system, reading /proc/softirqs show ~40% performance
    gain with this patch.
    
    Signed-off-by: David Wang <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    zq-david-wang authored and torvalds committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    84b9749 View commit details
    Browse the repository at this point in the history
  29. Merge tag 'pwm/for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/li…

    …nux/kernel/git/ukleinek/linux
    
    Pull pwm fix from Uwe Kleine-König:
     "Fix period setting in imx-tpm driver and a maintainer update
    
      Erik Schumacher found and fixed a problem in the calculation of the
      PWM period setting yielding too long periods. Trevor Gamblin - who
      already cared about mainlining the pwm-axi-pwmgen driver - stepped
      forward as an additional reviewer.
    
      Thanks to Erik and Trevor"
    
    * tag 'pwm/for-6.12-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux:
      MAINTAINERS: add self as reviewer for AXI PWM GENERATOR
      pwm: imx-tpm: Use correct MODULO value for EPWM mode
    torvalds committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    80fb253 View commit details
    Browse the repository at this point in the history
  30. net/smc: Fix lookup of netdev by using ib_device_get_netdev()

    The SMC-R variant of the SMC protocol used direct call to function
    ib_device_ops.get_netdev() to lookup netdev. As we used mlx5 device
    driver to run SMC-R, it failed to find a device, because in mlx5_ib the
    internal net device management for retrieving net devices was replaced
    by a common interface ib_device_get_netdev() in commit 8d159eb
    ("RDMA/mlx5: Use IB set_netdev and get_netdev functions").
    
    Since such direct accesses to the internal net device management is not
    recommended at all, update the SMC-R code to use proper API
    ib_device_get_netdev().
    
    Fixes: 5490357 ("net/smc: allow pnetid-less configuration")
    Reported-by: Aswin K <[email protected]>
    Reviewed-by: Gerd Bayer <[email protected]>
    Reviewed-by: Halil Pasic <[email protected]>
    Reviewed-by: Simon Horman <[email protected]>
    Reviewed-by: Dust Li <[email protected]>
    Reviewed-by: Wen Gu <[email protected]>
    Reviewed-by: Zhu Yanjun <[email protected]>
    Reviewed-by: D. Wythe <[email protected]>
    Signed-off-by: Wenjia Zhang <[email protected]>
    Reviewed-by: Leon Romanovsky <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Wenjia Zhang authored and kuba-moo committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    de88df0 View commit details
    Browse the repository at this point in the history
  31. rxrpc: Fix missing locking causing hanging calls

    If a call gets aborted (e.g. because kafs saw a signal) between it being
    queued for connection and the I/O thread picking up the call, the abort
    will be prioritised over the connection and it will be removed from
    local->new_client_calls by rxrpc_disconnect_client_call() without a lock
    being held.  This may cause other calls on the list to disappear if a race
    occurs.
    
    Fix this by taking the client_call_lock when removing a call from whatever
    list its ->wait_link happens to be on.
    
    Signed-off-by: David Howells <[email protected]>
    cc: [email protected]
    Reported-by: Marc Dionne <[email protected]>
    Fixes: 9d35d88 ("rxrpc: Move client call connection to the I/O thread")
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    dhowells authored and kuba-moo committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    fc9de52 View commit details
    Browse the repository at this point in the history
  32. net/smc: do not leave a dangling sk pointer in __smc_create()

    Thanks to commit 4bbd360 ("socket: Print pf->create() when
    it does not clear sock->sk on failure."), syzbot found an issue with AF_SMC:
    
    smc_create must clear sock->sk on failure, family: 43, type: 1, protocol: 0
     WARNING: CPU: 0 PID: 5827 at net/socket.c:1565 __sock_create+0x96f/0xa30 net/socket.c:1563
    Modules linked in:
    CPU: 0 UID: 0 PID: 5827 Comm: syz-executor259 Not tainted 6.12.0-rc6-next-20241106-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
     RIP: 0010:__sock_create+0x96f/0xa30 net/socket.c:1563
    Code: 03 00 74 08 4c 89 e7 e8 4f 3b 85 f8 49 8b 34 24 48 c7 c7 40 89 0c 8d 8b 54 24 04 8b 4c 24 0c 44 8b 44 24 08 e8 32 78 db f7 90 <0f> 0b 90 90 e9 d3 fd ff ff 89 e9 80 e1 07 fe c1 38 c1 0f 8c ee f7
    RSP: 0018:ffffc90003e4fda0 EFLAGS: 00010246
    RAX: 099c6f938c7f4700 RBX: 1ffffffff1a595fd RCX: ffff888034823c00
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: 00000000ffffffe9 R08: ffffffff81567052 R09: 1ffff920007c9f50
    R10: dffffc0000000000 R11: fffff520007c9f51 R12: ffffffff8d2cafe8
    R13: 1ffffffff1a595fe R14: ffffffff9a789c40 R15: ffff8880764298c0
    FS:  000055557b518380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007fa62ff43225 CR3: 0000000031628000 CR4: 00000000003526f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
      sock_create net/socket.c:1616 [inline]
      __sys_socket_create net/socket.c:1653 [inline]
      __sys_socket+0x150/0x3c0 net/socket.c:1700
      __do_sys_socket net/socket.c:1714 [inline]
      __se_sys_socket net/socket.c:1712 [inline]
    
    For reference, see commit 2d859af ("Merge branch
    'do-not-leave-dangling-sk-pointers-in-pf-create-functions'")
    
    Fixes: d25a92c ("net/smc: Introduce IPPROTO_SMC")
    Signed-off-by: Eric Dumazet <[email protected]>
    Cc: Ignat Korchagin <[email protected]>
    Cc: D. Wythe <[email protected]>
    Cc: Dust Li <[email protected]>
    Reviewed-by: Kuniyuki Iwashima <[email protected]>
    Reviewed-by: Wenjia Zhang <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Eric Dumazet authored and kuba-moo committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    d293958 View commit details
    Browse the repository at this point in the history
  33. drivers: net: ionic: add missed debugfs cleanup to ionic_probe() erro…

    …r path
    
    The ionic_setup_one() creates a debugfs entry for ionic upon
    successful execution. However, the ionic_probe() does not
    release the dentry before returning, resulting in a memory
    leak.
    
    To fix this bug, we add the ionic_debugfs_del_dev() to release
    the resources in a timely manner before returning.
    
    Fixes: 0de38d9 ("ionic: extract common bits from ionic_probe")
    Signed-off-by: Wentao Liang <[email protected]>
    Acked-by: Shannon Nelson <[email protected]>
    Link: https://patch.msgid.link/[email protected]
    Signed-off-by: Jakub Kicinski <[email protected]>
    Wentao-Liang authored and kuba-moo committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    71712cf View commit details
    Browse the repository at this point in the history
  34. Merge tag 'nvme-6.12-2024-11-07' of git://git.infradead.org/nvme into…

    … block-6.12
    
    Pull NVMe fix from Keith:
    
    "nvme fix for Linux 6.13
    
     - Use correct list traversal for srcu lists (Breno)"
    
    * tag 'nvme-6.12-2024-11-07' of git://git.infradead.org/nvme:
      nvme/host: Fix RCU list traversal to use SRCU primitive
    axboe committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    52ff8e9 View commit details
    Browse the repository at this point in the history
  35. Merge tag 'net-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/netdev/net
    
    Pull networking fixes from Jakub Kicinski:
     "Including fixes from can and netfilter.
    
      Things are slowing down quite a bit, mostly driver fixes here. No
      known ongoing investigations.
    
      Current release - new code bugs:
    
       - eth: ti: am65-cpsw:
          - fix multi queue Rx on J7
          - fix warning in am65_cpsw_nuss_remove_rx_chns()
    
      Previous releases - regressions:
    
       - mptcp: do not require admin perm to list endpoints, got missed in a
         refactoring
    
       - mptcp: use sock_kfree_s instead of kfree
    
      Previous releases - always broken:
    
       - sctp: properly validate chunk size in sctp_sf_ootb() fix OOB access
    
       - virtio_net: make RSS interact properly with queue number
    
       - can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation
    
       - can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing
         configuration when switching CAN modes
    
      Misc:
    
       - revert earlier hns3 fixes, they were ignoring IOMMU abstractions
         and need to be reworked
    
       - can: {cc770,sja1000}_isa: allow building on x86_64"
    
    * tag 'net-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
      drivers: net: ionic: add missed debugfs cleanup to ionic_probe() error path
      net/smc: do not leave a dangling sk pointer in __smc_create()
      rxrpc: Fix missing locking causing hanging calls
      net/smc: Fix lookup of netdev by using ib_device_get_netdev()
      net: arc: rockchip: fix emac mdio node support
      net: arc: fix the device for dma_map_single/dma_unmap_single
      virtio_net: Update rss when set queue
      virtio_net: Sync rss config to device when virtnet_probe
      virtio_net: Add hash_key_length check
      virtio_net: Support dynamic rss indirection table size
      netfilter: nf_tables: wait for rcu grace period on net_device removal
      net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case
      net: vertexcom: mse102x: Fix possible double free of TX skb
      mptcp: use sock_kfree_s instead of kfree
      mptcp: no admin perm to list endpoints
      net: phy: ti: add PHY_RST_AFTER_CLK_EN flag
      net: ethernet: ti: am65-cpsw: fix warning in am65_cpsw_nuss_remove_rx_chns()
      net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7
      net: hns3: fix kernel crash when uninstalling driver
      Revert "Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'"
      ...
    torvalds committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    bfc64d9 View commit details
    Browse the repository at this point in the history
  36. bcachefs: Fix null ptr deref in bucket_gen_get()

    bucket_gen() checks if we're lookup up a valid bucket and returns NULL
    otherwise, but bucket_gen_get() was failing to check; other callers were
    correct.
    
    Also do a bit of cleanup on callers.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    fd00045 View commit details
    Browse the repository at this point in the history
  37. bcachefs: Fix error handling in bch2_btree_node_prefetch()

    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    72acab3 View commit details
    Browse the repository at this point in the history
  38. bcachefs: Ancient versions with bad bkey_formats are no longer supported

    Syzbot found an assertion pop, by generating an ancient filesystem
    version with an invalid bkey_format (with fields that can overflow) as
    well as packed keys that aren't representable unpacked.
    
    This breaks key comparisons in all sorts of painful ways.
    
    Filesystems have been automatically rewriting nodes with such invalid
    formats for years; we can safely drop support for them.
    
    Reported-by: [email protected]
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    d335bb3 View commit details
    Browse the repository at this point in the history
  39. bcachefs: Fix topology errors on split after merge

    If a btree split picks a pivot that's being deleted by a btree node
    merge, we're going to have problems.
    
    Fix this by checking if the pivot is being deleted, the same as we check
    for deletions in journal replay keys.
    
    Found by single_devic.ktest small_nodes.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    cec136d View commit details
    Browse the repository at this point in the history
  40. bcachefs: Ensure BCH_FS_may_go_rw is set before exiting recovery

    If BCH_FS_may_go_rw is not yet set, it indicates to the transaction
    commit path that updates should be done via the list of journal replay
    keys.
    
    This must be set before multithreaded use commences.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    ef4f6c3 View commit details
    Browse the repository at this point in the history
  41. bcachefs: add check NULL return of bio_kmalloc in journal_read_bucket

    bio_kmalloc may return NULL, will cause NULL pointer dereference.
    Add check NULL return for bio_kmalloc in journal_read_bucket.
    
    Signed-off-by: Pei Xiao <[email protected]>
    Fixes: ac10a96 ("bcachefs: Some fixes for building in userspace")
    Signed-off-by: Kent Overstreet <[email protected]>
    Pei Xiao authored and Kent Overstreet committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    93d53f1 View commit details
    Browse the repository at this point in the history
  42. bcachefs: check the invalid parameter for perf test

    The perf_test does not check the number of iterations and threads
    when it is zero. If nr_thread is 0, the perf test will keep
    waiting for wakekup. If iteration is 0, it will cause exception
    of division by zero. This can be reproduced by:
      echo "rand_insert 0 1" > /sys/fs/bcachefs/${uuid}/perf_test
    or
      echo "rand_insert 1 0" > /sys/fs/bcachefs/${uuid}/perf_test
    
    Fixes: 1c6fdbd ("bcachefs: Initial commit")
    Signed-off-by: Hongbo Li <[email protected]>
    Signed-off-by: Kent Overstreet <[email protected]>
    Hongbo Li authored and Kent Overstreet committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    9bb3385 View commit details
    Browse the repository at this point in the history
  43. bcachefs: btree_cache.freeable list fixes

    When allocating new btree nodes, we were leaving them on the freeable
    list - unlocked - allowing them to be reclaimed: ouch.
    
    Additionally, bch2_btree_node_free_never_used() ->
    bch2_btree_node_hash_remove was putting it on the freelist, while
    bch2_btree_node_free_never_used() was putting it back on the btree
    update reserve list - ouch.
    
    Originally, the code was written to always keep btree nodes on a list -
    live or freeable - and this worked when new nodes were kept locked.
    
    But now with the cycle detector, we can't keep nodes locked that aren't
    tracked by the cycle detector; and this is fine as long as they're not
    reachable.
    
    We also have better and more robust leak detection now, with memory
    allocation profiling, so the original justification no longer applies.
    
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    baefd3f View commit details
    Browse the repository at this point in the history
  44. bcachefs: Change OPT_STR max to be 1 less than the size of choices array

    Change OPT_STR max value to be 1 less than the "ARRAY_SIZE" of "_choices"
    array. As a result, remove -1 from (opt->max-1) in bch2_opt_to_text.
    
    The "_choices" array is a null-terminated array, so computing the maximum
    using "ARRAY_SIZE" without subtracting 1 yields an incorrect result. Since
    bch2_opt_validate don't subtract 1, as bch2_opt_to_text does, values
    bigger than the actual maximum would pass through option validation.
    
    Reported-by: [email protected]
    Closes: https://syzkaller.appspot.com/bug?extid=bee87a0c3291c06aa8c6
    Fixes: 63c4b25 ("bcachefs: Better superblock opt validation")
    Suggested-by: Kent Overstreet <[email protected]>
    Signed-off-by: Piotr Zalewski <[email protected]>
    Signed-off-by: Kent Overstreet <[email protected]>
    JungerBoyo authored and Kent Overstreet committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    f9f0a53 View commit details
    Browse the repository at this point in the history
  45. bcachefs: Fix UAF in __promote_alloc() error path

    If we error in data_update_init() after adding to the rhashtable of
    outstanding promotes, kfree_rcu() is required.
    
    Reported-by: Reed Riley <[email protected]>
    Signed-off-by: Kent Overstreet <[email protected]>
    Kent Overstreet committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    8440da9 View commit details
    Browse the repository at this point in the history
  46. mm/page_alloc: keep track of free highatomic

    OOM kills due to vastly overestimated free highatomic reserves were
    observed:
    
      ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ...
      Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ...
      Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB
    
    The second line above shows that the OOM kill was due to the following
    condition:
    
      free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)
    
    And the third line shows there were no free pages in any
    MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'. 
    Therefore __zone_watermark_unusable_free() underestimated the usable free
    memory by over 1GB, which resulted in the unnecessary OOM kill above.
    
    The comments in __zone_watermark_unusable_free() warns about the potential
    risk, i.e.,
    
      If the caller does not have rights to reserves below the min
      watermark then subtract the high-atomic reserves. This will
      over-estimate the size of the atomic reserve but it avoids a search.
    
    However, it is possible to keep track of free pages in reserved highatomic
    pageblocks with a new per-zone counter nr_free_highatomic protected by the
    zone lock, to avoid a search when calculating the usable free memory.  And
    the cost would be minimal, i.e., simple arithmetics in the highatomic
    alloc/free/move paths.
    
    Note that since nr_free_highatomic can be relatively small, using a
    per-cpu counter might cause too much drift and defeat its purpose, in
    addition to the extra memory overhead.
    
    Dependson e0932b6 ("mm: page_alloc: consolidate free page accounting") - see [1]
    
    [[email protected]: s/if/else if/, per Johannes, stealth whitespace tweak]
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected] [1]
    Fixes: 0aaa29a ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
    Signed-off-by: Yu Zhao <[email protected]>
    Reported-by: Link Lin <[email protected]>
    Acked-by: David Rientjes <[email protected]>
    Acked-by: Vlastimil Babka <[email protected]>
    Acked-by: Johannes Weiner <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    yuzhaogoogle authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    c928807 View commit details
    Browse the repository at this point in the history
  47. objpool: fix to make percpu slot allocation more robust

    Since gfp & GFP_ATOMIC == GFP_ATOMIC is true for GFP_KERNEL | GFP_HIGH, it
    will use kmalloc if user specifies that combination.  Here the reason why
    combining the __vmalloc_node() and kmalloc_node() is that the vmalloc does
    not support all GFP flag, especially GFP_ATOMIC.  So we should check if
    gfp & (GFP_ATOMIC | GFP_KERNEL) != GFP_ATOMIC for vmalloc first.  This
    ensures caller can sleep.  And for the robustness, even if vmalloc fails,
    it should retry with kmalloc to allocate it.
    
    Link: https://lkml.kernel.org/r/173008598713.1262174.2959179484209897252.stgit@mhiramat.roam.corp.google.com
    Fixes: aff1871 ("objpool: fix choosing allocation for percpu slots")
    Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
    Reported-by: Linus Torvalds <[email protected]>
    Closes: https://lore.kernel.org/all/CAHk-=whO+vSH+XVRio8byJU8idAWES0SPGVZ7KAVdc4qrV0VUA@mail.gmail.com/
    Cc: Leo Yan <[email protected]>
    Cc: Linus Torvalds <[email protected]>
    Cc: Matt Wu <[email protected]>
    Cc: Mikel Rychliski <[email protected]>
    Cc: Steven Rostedt (Google) <[email protected]>
    Cc: Viktor Malik <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    mhiramat authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    cb6fcef View commit details
    Browse the repository at this point in the history
  48. mm/mlock: set the correct prev on failure

    After commit 94d7d92 ("mm: abstract the vma_merge()/split_vma()
    pattern for mprotect() et al."), if vma_modify_flags() return error, the
    vma is set to an error code.  This will lead to an invalid prev be
    returned.
    
    Generally this shouldn't matter as the caller should treat an error as
    indicating state is now invalidated, however unfortunately
    apply_mlockall_flags() does not check for errors and assumes that
    mlock_fixup() correctly maintains prev even if an error were to occur.
    
    This patch fixes that assumption.
    
    [[email protected]: provide a better fix and rephrase the log]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 94d7d92 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.")
    Signed-off-by: Wei Yang <[email protected]>
    Reviewed-by: Lorenzo Stoakes <[email protected]>
    Reviewed-by: Liam R. Howlett <[email protected]>
    Cc: Vlastimil Babka <[email protected]>
    Cc: Jann Horn <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    RichardWeiYang authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    faa242b View commit details
    Browse the repository at this point in the history
  49. mm/damon/core: handle zero {aggregation,ops_update} intervals

    Patch series "mm/damon/core: fix handling of zero non-sampling intervals".
    
    DAMON's internal intervals accounting logic is not correctly handling
    non-sampling intervals of zero values for a wrong assumption.  This could
    cause unexpected monitoring behavior, and even result in infinite hang of
    DAMON sysfs interface user threads in case of zero aggregation interval. 
    Fix those by updating the intervals accounting logic.  For details of the
    root case and solutions, please refer to commit messages of fixes.
    
    
    This patch (of 2):
    
    DAMON's logics to determine if this is the time to do aggregation and ops
    update assumes next_{aggregation,ops_update}_sis are always set larger
    than current passed_sample_intervals.  And therefore it further assumes
    continuously incrementing passed_sample_intervals every sampling interval
    will make it reaches to the next_{aggregation,ops_update}_sis in future. 
    The logic therefore make the action and update
    next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same
    to the counts, respectively.
    
    If Aggregation interval or Ops update interval are zero, however,
    next_aggregation_sis or next_ops_update_sis are set same to current
    passed_sample_intervals, respectively.  And passed_sample_intervals is
    incremented before doing the next_{aggregation,ops_update}_sis check. 
    Hence, passed_sample_intervals becomes larger than
    next_{aggregation,ops_update}_sis, and the logic says it is not the time
    to do the action and update next_{aggregation,ops_update}_sis forever,
    until an overflow happens.  In other words, DAMON stops doing aggregations
    or ops updates effectively forever, and users cannot get monitoring
    results.
    
    Based on the documents and the common sense, a reasonable behavior for
    such inputs is doing an aggregation and an ops update for every sampling
    interval.  Handle the case by removing the assumption.
    
    Note that this could incur particular real issue for DAMON sysfs interface
    users, in case of zero Aggregation interval.  When user starts DAMON with
    zero Aggregation interval and asks online DAMON parameter tuning via DAMON
    sysfs interface, the request is handled by the aggregation callback. 
    Until the callback finishes the work, the user who requested the online
    tuning just waits.  Hence, the user will be stuck until the
    passed_sample_intervals overflows.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 4472edf ("mm/damon/core: use number of passed access sampling as a timer")
    Signed-off-by: SeongJae Park <[email protected]>
    Cc: <[email protected]>	[6.7.x]
    Signed-off-by: Andrew Morton <[email protected]>
    sjp38 authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    3488af0 View commit details
    Browse the repository at this point in the history
  50. mm/damon/core: handle zero schemes apply interval

    DAMON's logics to determine if this is the time to apply damos schemes
    assumes next_apply_sis is always set larger than current
    passed_sample_intervals.  And therefore assume continuously incrementing
    passed_sample_intervals will make it reaches to the next_apply_sis in
    future.  The logic hence does apply the scheme and update next_apply_sis
    only if passed_sample_intervals is same to next_apply_sis.
    
    If Schemes apply interval is set as zero, however, next_apply_sis is set
    same to current passed_sample_intervals, respectively.  And
    passed_sample_intervals is incremented before doing the next_apply_sis
    check.  Hence, next_apply_sis becomes larger than next_apply_sis, and the
    logic says it is not the time to apply schemes and update next_apply_sis. 
    In other words, DAMON stops applying schemes until passed_sample_intervals
    overflows.
    
    Based on the documents and the common sense, a reasonable behavior for
    such inputs would be applying the schemes for every sampling interval. 
    Handle the case by removing the assumption.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 42f994b ("mm/damon/core: implement scheme-specific apply interval")
    Signed-off-by: SeongJae Park <[email protected]>
    Cc: <[email protected]>	[6.7.x]
    Signed-off-by: Andrew Morton <[email protected]>
    sjp38 authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    8e7bde6 View commit details
    Browse the repository at this point in the history
  51. mm/damon/core: avoid overflow in damon_feed_loop_next_input()

    damon_feed_loop_next_input() is inefficient and fragile to overflows. 
    Specifically, 'score_goal_diff_bp' calculation can overflow when 'score'
    is high.  The calculation is actually unnecessary at all because 'goal' is
    a constant of value 10,000.  Calculation of 'compensation' is again
    fragile to overflow.  Final calculation of return value for under-achiving
    case is again fragile to overflow when the current score is
    under-achieving the target.
    
    Add two corner cases handling at the beginning of the function to make the
    body easier to read, and rewrite the body of the function to avoid
    overflows and the unnecessary bp value calcuation.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 9294a03 ("mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning")
    Signed-off-by: SeongJae Park <[email protected]>
    Reported-by: Guenter Roeck <[email protected]>
    Closes: https://lore.kernel.org/[email protected]
    Tested-by: Guenter Roeck <[email protected]>
    Cc: <[email protected]>	[6.8.x]
    Signed-off-by: Andrew Morton <[email protected]>
    sjp38 authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    4401e9d View commit details
    Browse the repository at this point in the history
  52. mm: fix docs for the kernel parameter thp_anon=

    If we add ``thp_anon=32,64K:always`` to the kernel command line, we
    will see the following error:
    
    [    0.000000] huge_memory: thp_anon=32,64K:always: error parsing string, ignoring setting
    
    This happens because the correct format isn't ``thp_anon=<size>,<size>[KMG]:<state>```,
    as [KMG] must follow each number to especify its unit. So, the correct
    format is ``thp_anon=<size>[KMG],<size>[KMG]:<state>```.
    
    Therefore, adjust the documentation to reflect the correct format of the
    parameter ``thp_anon=``.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: dd4d30d ("mm: override mTHP "enabled" defaults at kernel cmdline")
    Signed-off-by: Maíra Canal <[email protected]>
    Acked-by: Barry Song <[email protected]>
    Acked-by: David Hildenbrand <[email protected]>
    Cc: Baolin Wang <[email protected]>
    Cc: Hugh Dickins <[email protected]>
    Cc: Jonathan Corbet <[email protected]>
    Cc: Lance Yang <[email protected]>
    Cc: Ryan Roberts <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    mairacanal authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    652e1a5 View commit details
    Browse the repository at this point in the history
  53. selftests: hugetlb_dio: check for initial conditions to skip in the s…

    …tart
    
    The test should be skipped if initial conditions aren't fulfilled in the
    start instead of failing and outputting non-compliant TAP logs.  This kind
    of failure pollutes the results.  The initial conditions are:
    
    - The test should only execute if /tmp file can be allocated.
    - The test should only execute if huge pages are free.
    
    Before:
    TAP version 13
    1..4
    Bail out! Error opening file
    : Read-only file system (30)
     # Planned tests != run tests (4 != 0)
     # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0
    
    After:
    TAP version 13
    1..0 # SKIP Unable to allocate file: Read-only file system
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Muhammad Usama Anjum <[email protected]>
    Fixes: 3a103b5 ("selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()")
    Cc: Muhammad Usama Anjum <[email protected]>
    Cc: Shuah Khan <[email protected]>
    Cc: Donet Tom <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    musamaanjum authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    0268d45 View commit details
    Browse the repository at this point in the history
  54. ucounts: fix counter leak in inc_rlimit_get_ucounts()

    The inc_rlimit_get_ucounts() increments the specified rlimit counter and
    then checks its limit.  If the value exceeds the limit, the function
    returns an error without decrementing the counter.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 15bc01e ("ucounts: Fix signal ucount refcounting")
    Signed-off-by: Andrei Vagin <[email protected]>
    Co-developed-by: Roman Gushchin <[email protected]>
    Signed-off-by: Roman Gushchin <[email protected]>
    Tested-by: Roman Gushchin <[email protected]>
    Acked-by: Alexey Gladkov <[email protected]>
    Cc: Kees Cook <[email protected]>
    Cc: Andrei Vagin <[email protected]>
    Cc: "Eric W. Biederman" <[email protected]>
    Cc: Alexey Gladkov <[email protected]>
    Cc: Oleg Nesterov <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    avagin authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    432dc06 View commit details
    Browse the repository at this point in the history
  55. fs/proc: fix compile warning about variable 'vmcore_mmap_ops'

    When build with !CONFIG_MMU, the variable 'vmcore_mmap_ops'
    is defined but not used:
    
    >> fs/proc/vmcore.c:458:42: warning: unused variable 'vmcore_mmap_ops'
         458 | static const struct vm_operations_struct vmcore_mmap_ops = {
    
    Fix this by only defining it when CONFIG_MMU is enabled.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 9cb2181 ("vmcore: introduce remap_oldmem_pfn_range()")
    Signed-off-by: Qi Xi <[email protected]>
    Reported-by: kernel test robot <[email protected]>
    Closes: https://lore.kernel.org/lkml/[email protected]/
    Cc: Baoquan He <[email protected]>
    Cc: Dave Young <[email protected]>
    Cc: Michael Holzheu <[email protected]>
    Cc: Vivek Goyal <[email protected]>
    Cc: Wang ShaoBo <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    Qi Xi authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    b8ee299 View commit details
    Browse the repository at this point in the history
  56. signal: restore the override_rlimit logic

    Prior to commit d646969 ("Reimplement RLIMIT_SIGPENDING on top of
    ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of
    signals.  However now it's enforced unconditionally, even if
    override_rlimit is set.  This behavior change caused production issues.  
    
    For example, if the limit is reached and a process receives a SIGSEGV
    signal, sigqueue_alloc fails to allocate the necessary resources for the
    signal delivery, preventing the signal from being delivered with siginfo. 
    This prevents the process from correctly identifying the fault address and
    handling the error.  From the user-space perspective, applications are
    unaware that the limit has been reached and that the siginfo is
    effectively 'corrupted'.  This can lead to unpredictable behavior and
    crashes, as we observed with java applications.
    
    Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip
    the comparison to max there if override_rlimit is set.  This effectively
    restores the old behavior.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: d646969 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
    Signed-off-by: Roman Gushchin <[email protected]>
    Co-developed-by: Andrei Vagin <[email protected]>
    Signed-off-by: Andrei Vagin <[email protected]>
    Acked-by: Oleg Nesterov <[email protected]>
    Acked-by: Alexey Gladkov <[email protected]>
    Cc: Kees Cook <[email protected]>
    Cc: "Eric W. Biederman" <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    rgushchin authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    9e05e5c View commit details
    Browse the repository at this point in the history
  57. ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_…

    …remove()
    
    Syzkaller is able to provoke null-ptr-dereference in ocfs2_xa_remove():
    
    [   57.319872] (a.out,1161,7):ocfs2_xa_remove:2028 ERROR: status = -12
    [   57.320420] (a.out,1161,7):ocfs2_xa_cleanup_value_truncate:1999 ERROR: Partial truncate while removing xattr overlay.upper.  Leaking 1 clusters and removing the entry
    [   57.321727] BUG: kernel NULL pointer dereference, address: 0000000000000004
    [...]
    [   57.325727] RIP: 0010:ocfs2_xa_block_wipe_namevalue+0x2a/0xc0
    [...]
    [   57.331328] Call Trace:
    [   57.331477]  <TASK>
    [...]
    [   57.333511]  ? do_user_addr_fault+0x3e5/0x740
    [   57.333778]  ? exc_page_fault+0x70/0x170
    [   57.334016]  ? asm_exc_page_fault+0x2b/0x30
    [   57.334263]  ? __pfx_ocfs2_xa_block_wipe_namevalue+0x10/0x10
    [   57.334596]  ? ocfs2_xa_block_wipe_namevalue+0x2a/0xc0
    [   57.334913]  ocfs2_xa_remove_entry+0x23/0xc0
    [   57.335164]  ocfs2_xa_set+0x704/0xcf0
    [   57.335381]  ? _raw_spin_unlock+0x1a/0x40
    [   57.335620]  ? ocfs2_inode_cache_unlock+0x16/0x20
    [   57.335915]  ? trace_preempt_on+0x1e/0x70
    [   57.336153]  ? start_this_handle+0x16c/0x500
    [   57.336410]  ? preempt_count_sub+0x50/0x80
    [   57.336656]  ? _raw_read_unlock+0x20/0x40
    [   57.336906]  ? start_this_handle+0x16c/0x500
    [   57.337162]  ocfs2_xattr_block_set+0xa6/0x1e0
    [   57.337424]  __ocfs2_xattr_set_handle+0x1fd/0x5d0
    [   57.337706]  ? ocfs2_start_trans+0x13d/0x290
    [   57.337971]  ocfs2_xattr_set+0xb13/0xfb0
    [   57.338207]  ? dput+0x46/0x1c0
    [   57.338393]  ocfs2_xattr_trusted_set+0x28/0x30
    [   57.338665]  ? ocfs2_xattr_trusted_set+0x28/0x30
    [   57.338948]  __vfs_removexattr+0x92/0xc0
    [   57.339182]  __vfs_removexattr_locked+0xd5/0x190
    [   57.339456]  ? preempt_count_sub+0x50/0x80
    [   57.339705]  vfs_removexattr+0x5f/0x100
    [...]
    
    Reproducer uses faultinject facility to fail ocfs2_xa_remove() ->
    ocfs2_xa_value_truncate() with -ENOMEM.
    
    In this case the comment mentions that we can return 0 if
    ocfs2_xa_cleanup_value_truncate() is going to wipe the entry
    anyway. But the following 'rc' check is wrong and execution flow do
    'ocfs2_xa_remove_entry(loc);' twice:
    * 1st: in ocfs2_xa_cleanup_value_truncate();
    * 2nd: returning back to ocfs2_xa_remove() instead of going to 'out'.
    
    Fix this by skipping the 2nd removal of the same entry and making
    syzkaller repro happy.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Fixes: 399ff3a ("ocfs2: Handle errors while setting external xattr values.")
    Signed-off-by: Andrew Kanner <[email protected]>
    Reported-by: [email protected]
    Closes: https://lore.kernel.org/all/[email protected]/T/
    Tested-by: [email protected]
    Reviewed-by: Joseph Qi <[email protected]>
    Cc: Mark Fasheh <[email protected]>
    Cc: Joel Becker <[email protected]>
    Cc: Junxiao Bi <[email protected]>
    Cc: Changwei Ge <[email protected]>
    Cc: Jun Piao <[email protected]>
    Cc: <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    kanner authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    0b63c0e View commit details
    Browse the repository at this point in the history
  58. mailmap: add entry for Thorsten Blum

    Map my previously used email address to my @linux.dev address.
    
    Link: https://lkml.kernel.org/r/[email protected]
    Signed-off-by: Thorsten Blum <[email protected]>
    Cc: Alex Elder <[email protected]>
    Cc: David S. Miller <[email protected]>
    Cc: Geliang Tang <[email protected]>
    Cc: Kees Cook <[email protected]>
    Cc: Mathieu Othacehe <[email protected]>
    Cc: Matthieu Baerts (NGI0) <[email protected]>
    Cc: Matt Ranostay <[email protected]>
    Cc: Naoya Horiguchi <[email protected]>
    Cc: Neeraj Upadhyay <[email protected]>
    Cc: Quentin Monnet <[email protected]>
    Signed-off-by: Andrew Morton <[email protected]>
    toblux authored and akpm00 committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    c289f4d View commit details
    Browse the repository at this point in the history
  59. Merge tag 'regulator-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/broonie/regulator
    
    Pull regulator fixes from Mark Brown:
     "A couple of small fixes for drivers, nothing particularly remarkable"
    
    * tag 'regulator-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
      regulator: rk808: Add apply_bit for BUCK3 on RK809
      regulator: rtq2208: Fix uninitialized use of regulator_config
    torvalds committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    7b85bb4 View commit details
    Browse the repository at this point in the history
  60. Merge tag 'spi-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/broonie/spi
    
    Pull spi fix from Mark Brown:
     "An update for the maintainers of the AMD driver following some job
      changes there"
    
    * tag 'spi-fix-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
      MAINTAINERS: update AMD SPI maintainer
    torvalds committed Nov 7, 2024
    Configuration menu
    Copy the full SHA
    906bd68 View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2024

  1. Merge tag 'amd-drm-fixes-6.12-2024-11-07' of https://gitlab.freedeskt…

    …op.org/agd5f/linux into drm-fixes
    
    amd-drm-fixes-6.12-2024-11-07:
    
    amdgpu:
    - Brightness fix
    - DC vbios parsing fix
    - ACPI fix
    - SMU 14.x fix
    - Power workload profile fix
    - GC partitioning fix
    - Debugfs fixes
    
    Signed-off-by: Dave Airlie <[email protected]>
    
    From: Alex Deucher <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    airlied committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    fd836e8 View commit details
    Browse the repository at this point in the history
  2. Merge tag 'usb-serial-6.12-rc7' of ssh://gitolite.kernel.org/pub/scm/…

    …linux/kernel/git/johan/usb-serial into usb-linus
    
    Johan writes:
    
    USB-serial fixes for 6.12-rc7
    
    Here's a fix for a long-standing use-after-free in an io_edgeport debug
    printk and some new modem device ids.
    
    All have been in linux-next with no reported issues.
    
    * tag 'usb-serial-6.12-rc7' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
      USB: serial: qcserial: add support for Sierra Wireless EM86xx
      USB: serial: io_edgeport: fix use after free in debug printk
      USB: serial: option: add Quectel RG650V
      USB: serial: option: add Fibocom FG132 0x0112 composition
    gregkh committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    742afcc View commit details
    Browse the repository at this point in the history
  3. Merge tag 'asoc-fix-v6.12-rc6' of https://git.kernel.org/pub/scm/linu…

    …x/kernel/git/broonie/sound into for-linus
    
    ASoC: Fixes for v6.12
    
    A moderately large pile of small changes here, split fairly evenly
    between fixes and ID additions/quirks and all of it driver specific.
    tiwai committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    fa59caa View commit details
    Browse the repository at this point in the history
  4. Merge tag 'powerpc-6.12-6' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/powerpc/linux
    
    Pull powerpc fix from Madhavan Srinivasan:
    
     - Fix spurious interrupts in Book3S HV Nested KVM
    
    Thanks to Gautam Menghani.
    
    * tag 'powerpc-6.12-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
      KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid spurious interrupts
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    51b4786 View commit details
    Browse the repository at this point in the history
  5. Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/arm64/linux
    
    Pull arm64 fixes from Will Deacon:
     "Here is a (hopefully) final round of arm64 fixes for 6.12 that address
      some user-visible floating point register corruption. Both of the
      Marks have been working on this for a couple of weeks and we've ended
      up in a position where SVE is solid but SME still has enough pending
      issues that the most pragmatic solution for the release and stable
      backports is to disable the feature. Yes, it's a shame, but the
      hardware is rare as hen's teeth at the moment and we're better off
      getting back to a known good state before fixing it all properly.
      We're also improving the selftests for 6.13 to help avoid merging
      broken code in the future.
    
      Anyway, the good news is that we're removing a lot more code than
      we're adding.
    
      Summary:
    
       - Fix handling of SVE traps from userspace on preemptible kernels
         when converting the saved floating point state into SVE state.
    
       - Remove broken support for the SMCCCv1.3 "SVE discard hint"
         optimisation.
    
       - Disable SME support, as the current support code suffers from
         numerous issues around signal delivery, ptrace access and
         context-switch which can lead to user-visible corruption of the
         register state"
    
    * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
      arm64: Kconfig: Make SME depend on BROKEN for now
      arm64: smccc: Remove broken support for SMCCCv1.3 SVE discard hint
      arm64/sve: Discard stale CPU state when handling SVE traps
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    9ea7eda View commit details
    Browse the repository at this point in the history
  6. Merge tag 'bcachefs-2024-11-07' of git://evilpiepirate.org/bcachefs

    Pull bcachefs fixes from Kent Overstreet:
     "Some trivial syzbot fixes, two more serious btree fixes found by
      looping single_devices.ktest small_nodes:
    
       - Topology error on split after merge, where we accidentaly picked
         the node being deleted for the pivot, resulting in an assertion pop
    
       - New nodes being preallocated were left on the freedlist, unlocked,
         resulting in them sometimes being accidentally freed: this dated
         from pre-cycle detector, when we could leave them locked. This
         should have resulted in more explosions and fireworks, but turned
         out to be surprisingly hard to hit because the preallocated nodes
         were being used right away.
    
         The fix for this is bigger than we'd like - reworking btree list
         handling was a bit invasive - but we've now got more assertions and
         it's well tested.
    
       - Also another mishandled transaction restart fix (in
         btree_node_prefetch) - we're almost done with those"
    
    * tag 'bcachefs-2024-11-07' of git://evilpiepirate.org/bcachefs:
      bcachefs: Fix UAF in __promote_alloc() error path
      bcachefs: Change OPT_STR max to be 1 less than the size of choices array
      bcachefs: btree_cache.freeable list fixes
      bcachefs: check the invalid parameter for perf test
      bcachefs: add check NULL return of bio_kmalloc in journal_read_bucket
      bcachefs: Ensure BCH_FS_may_go_rw is set before exiting recovery
      bcachefs: Fix topology errors on split after merge
      bcachefs: Ancient versions with bad bkey_formats are no longer supported
      bcachefs: Fix error handling in bch2_btree_node_prefetch()
      bcachefs: Fix null ptr deref in bucket_gen_get()
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    b5f1b48 View commit details
    Browse the repository at this point in the history
  7. Merge tag 'for-6.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/kdave/linux
    
    Pull btrfs fixes from David Sterba:
     "A few more one-liners that fix some user visible problems:
    
       - use correct range when clearing qgroup reservations after COW
    
       - properly reset freed delayed ref list head
    
       - fix ro/rw subvolume mounts to be backward compatible with old and
         new mount API"
    
    * tag 'for-6.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
      btrfs: fix the length of reserved qgroup to free
      btrfs: reinitialize delayed ref list after deleting it from the list
      btrfs: fix per-subvolume RO/RW flags with new mount API
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    9183e03 View commit details
    Browse the repository at this point in the history
  8. Merge tag 'slab-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/k…

    …ernel/git/vbabka/slab
    
    Pull slab fix from Vlastimil Babka:
    
     - Fix for duplicate caches in some arm64 configurations with
       CONFIG_SLAB_BUCKETS (Koichiro Den)
    
    * tag 'slab-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
      mm/slab: fix warning caused by duplicate kmem_cache creation in kmem_buckets_create
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    f1dce1f View commit details
    Browse the repository at this point in the history
  9. Merge tag 'media/v6.12-2' of git://git.kernel.org/pub/scm/linux/kerne…

    …l/git/mchehab/linux-media
    
    Pull media fixes from Mauro Carvalho Chehab:
    
     - dvb-core fixes for vb2 check and device registration
    
     - v4l2-core: fix an issue with error handling for VIDIOC_G_CTRL
    
     - vb2 core: fix an issue with vb plane copy logic
    
     - videobuf2-core: copy vb planes unconditionally
    
     - vivid: fix buffer overwrite when using > 32 buffers
    
     - vivid: fix a potential division by zero due to an issue at v4l2-tpg
    
     - some spectre vulnerability fixes
    
     - several OOM access fixes
    
     - some buffer overflow fixes
    
    * tag 'media/v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
      media: videobuf2-core: copy vb planes unconditionally
      media: dvbdev: fix the logic when DVB_DYNAMIC_MINORS is not set
      media: vivid: fix buffer overwrite when using > 32 buffers
      media: pulse8-cec: fix data timestamp at pulse8_setup()
      media: cec: extron-da-hd-4k-plus: don't use -1 as an error code
      media: stb0899_algo: initialize cfr before using it
      media: adv7604: prevent underflow condition when reporting colorspace
      media: cx24116: prevent overflows on SNR calculus
      media: ar0521: don't overflow when checking PLL values
      media: s5p-jpeg: prevent buffer overflows
      media: av7110: fix a spectre vulnerability
      media: mgb4: protect driver against spectre
      media: dvb_frontend: don't play tricks with underflow values
      media: dvbdev: prevent the risk of out of memory access
      media: v4l2-tpg: prevent the risk of a division by zero
      media: v4l2-ctrls-api: fix error handling for v4l2_g_ctrl()
      media: dvb-core: add missing buffer index check
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    ceb0613 View commit details
    Browse the repository at this point in the history
  10. Merge tag 'sound-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kern…

    …el/git/tiwai/sound
    
    Pull sound fixes from Takashi Iwai:
     "Still more changes floating than wished at this late stage, but all
      are small device-specific fixes, and look less troublesome.
    
      Including a few ASoC quirk / ID additoins, a series of ASoC STM fixes,
      HD-audio conexant codec regression fix, and other various quirks and
      device-specific fixes"
    
    * tag 'sound-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
      ASoC: SOF: sof-client-probes-ipc4: Set param_size extension bits
      ASoC: stm: Prevent potential division by zero in stm32_sai_get_clk_div()
      ASoC: stm: Prevent potential division by zero in stm32_sai_mclk_round_rate()
      ASoC: amd: yc: Support dmic on another model of Lenovo Thinkpad E14 Gen 6
      ASoC: SOF: amd: Fix for incorrect DMA ch status register offset
      ASoC: amd: yc: fix internal mic on Xiaomi Book Pro 14 2022
      ASoC: stm32: spdifrx: fix dma channel release in stm32_spdifrx_remove
      MAINTAINERS: Generic Sound Card section
      ALSA: usb-audio: Add quirk for HP 320 FHD Webcam
      ASoC: tas2781: Add new driver version for tas2563 & tas2781 qfn chip
      ALSA: firewire-lib: fix return value on fail in amdtp_tscm_init()
      ALSA: ump: Don't enumeration invalid groups for legacy rawmidi
      Revert "ALSA: hda/conexant: Mute speakers at suspend / shutdown"
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    50643bb View commit details
    Browse the repository at this point in the history
  11. i2c: designware: do not hold SCL low when I2C_DYNAMIC_TAR_UPDATE is n…

    …ot set
    
    When the Tx FIFO is empty and the last command has no STOP bit
    set, the master holds SCL low. If I2C_DYNAMIC_TAR_UPDATE is not
    set, BIT(13) MST_ON_HOLD of IC_RAW_INTR_STAT is not enabled,
    causing the __i2c_dw_disable() timeout. This is quite similar to
    commit 2409205 ("i2c: designware: fix __i2c_dw_disable() in
    case master is holding SCL low"). Also check BIT(7)
    MST_HOLD_TX_FIFO_EMPTY in IC_STATUS, which is available when
    IC_STAT_FOR_CLK_STRETCH is set.
    
    Fixes: 2409205 ("i2c: designware: fix __i2c_dw_disable() in case master is holding SCL low")
    Co-developed-by: Xiaowu Ding <[email protected]>
    Signed-off-by: Xiaowu Ding <[email protected]>
    Co-developed-by: Angus Chen <[email protected]>
    Signed-off-by: Angus Chen <[email protected]>
    Signed-off-by: Liu Peibao <[email protected]>
    Acked-by: Jarkko Nikula <[email protected]>
    Signed-off-by: Andi Shyti <[email protected]>
    Liu Peibao authored and Andi Shyti committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    8de3e97 View commit details
    Browse the repository at this point in the history
  12. Merge tag 'drm-misc-fixes-2024-11-08' of https://gitlab.freedesktop.o…

    …rg/drm/misc/kernel into drm-fixes
    
    Short summary of fixes pull:
    
    imagination:
    - Track PVR context per file
    - Break ref-counting cycle
    
    panel-orientation-quirks:
    - Fix matching Lenovo Yoga Tab 3 X90F
    
    panthor:
    - Lock VM array
    - Be strict about I/O mapping flags
    
    Signed-off-by: Dave Airlie <[email protected]>
    
    From: Thomas Zimmermann <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
    airlied committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    9b984a7 View commit details
    Browse the repository at this point in the history
  13. Merge tag 'drm-xe-fixes-2024-11-08' of https://gitlab.freedesktop.org…

    …/drm/xe/kernel into drm-fixes
    
    Driver Changes:
    - Fix ccs_mode setting for Xe2 and later (Balasubramani)
    - Synchronize ccs_mode setting with client creation (Balasubramani)
    - Apply scheduling WA for LNL in additional places as needed
      (Nirmoy)
    - Fix leak and lock handling in error paths of xe_exec ioctl
      (Matthew Brost)
    - Fix GGTT allocation leak leading to eventual crash in SR-IOV
      (Michal Wajdeczko)
    - Move run_ticks update out of job handling to avoid synchronization
      with reader (Lucas)
    
    Signed-off-by: Dave Airlie <[email protected]>
    
    From: Lucas De Marchi <[email protected]>
    Link: https://patchwork.freedesktop.org/patch/msgid/4ffcebtluaaaohquxfyf5babpihmtscxwad3jjmt5nggwh2xpm@ztw67ucywttg
    airlied committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    1a6bbc4 View commit details
    Browse the repository at this point in the history
  14. Merge tag 'drm-fixes-2024-11-09' of https://gitlab.freedesktop.org/dr…

    …m/kernel
    
    Pull drm fixes from Dave Airlie:
     "Weekly fixes, usual leaders in amdgpu and xe, then a panel quirk, and
      some fixes to imagination and panthor drivers. Seems around the usual
      level for this time and don't know of any big problems.
    
      amdgpu:
       - Brightness fix
       - DC vbios parsing fix
       - ACPI fix
       - SMU 14.x fix
       - Power workload profile fix
       - GC partitioning fix
       - Debugfs fixes
    
      imagination:
       - Track PVR context per file
       - Break ref-counting cycle
    
      panel-orientation-quirks:
       - Fix matching Lenovo Yoga Tab 3 X90F
    
      panthor:
       - Lock VM array
       - Be strict about I/O mapping flags
    
      xe:
       - Fix ccs_mode setting for Xe2 and later
       - Synchronize ccs_mode setting with client creation
       - Apply scheduling WA for LNL in additional places as needed
       - Fix leak and lock handling in error paths of xe_exec ioctl
       - Fix GGTT allocation leak leading to eventual crash in SR-IOV
       - Move run_ticks update out of job handling to avoid synchronization
         with reader"
    
    * tag 'drm-fixes-2024-11-09' of https://gitlab.freedesktop.org/drm/kernel: (23 commits)
      drm/panthor: Be stricter about IO mapping flags
      drm/panthor: Lock XArray when getting entries for the VM
      drm: panel-orientation-quirks: Make Lenovo Yoga Tab 3 X90F DMI match less strict
      drm/xe: Stop accumulating LRC timestamp on job_free
      drm/xe/pf: Fix potential GGTT allocation leak
      drm/xe: Drop VM dma-resv lock on xe_sync_in_fence_get failure in exec IOCTL
      drm/xe: Fix possible exec queue leak in exec IOCTL
      drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()
      drm/amdgpu: Adjust debugfs eviction and IB access permissions
      drm/amdgpu: Adjust debugfs register access permissions
      drm/amdgpu: Fix DPX valid mode check on GC 9.4.3
      drm/amd/pm: correct the workload setting
      drm/amd/pm: always pick the pptable from IFWI
      drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported
      drm/amd/display: parse umc_info or vram_info based on ASIC
      drm/amd/display: Fix brightness level not retained over reboot
      drm/xe/guc/tlb: Flush g2h worker in case of tlb timeout
      drm/xe/ufence: Flush xe ordered_wq in case of ufence timeout
      drm/xe: Move LNL scheduling WA to xe_device.h
      drm/xe: Use the filelist from drm for ccs_mode change
      ...
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    952a33d View commit details
    Browse the repository at this point in the history
  15. Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/g…

    …it/jejb/scsi
    
    Pull SCSI fixes from James Bottomley:
     "Two small fixes, the drivers one in ufs simply delays running a work
      queue and the generic one in zoned storage switches to a more correct
      API that tries the standard buddy allocator first (for small
      allocations); this fixes an allocation problem with small allocations
      seen under memory pressure"
    
    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
      scsi: ufs: core: Start the RTC update work later
      scsi: sd_zbc: Use kvzalloc() to allocate REPORT ZONES buffer
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    c291c9c View commit details
    Browse the repository at this point in the history
  16. Merge tag 'v6.12-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd

    Pull smb server fixes from Steve French:
     "Four fixes, all also marked for stable:
    
       - fix two potential use after free issues
    
       - fix OOM issue with many simultaneous requests
    
       - fix missing error check in RPC pipe handling"
    
    * tag 'v6.12-rc6-ksmbd-fixes' of git://git.samba.org/ksmbd:
      ksmbd: check outstanding simultaneous SMB operations
      ksmbd: fix slab-use-after-free in smb3_preauth_hash_rsp
      ksmbd: fix slab-use-after-free in ksmbd_smb2_session_create
      ksmbd: Fix the missing xa_store error check
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    1eb714c View commit details
    Browse the repository at this point in the history
  17. Merge tag 'acpi-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kerne…

    …l/git/rafael/linux-pm
    
    Pull ACPI fix from Rafael Wysocki:
     "Fix the ACPI processor driver initialization ordering after recent
      changes to avoid calling init_freq_invariance_cppc() too early on AMD
      platforms (Mario Limonciello)"
    
    * tag 'acpi-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
      ACPI: processor: Move arch_init_invariance_cppc() call later
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    c7a8f2a View commit details
    Browse the repository at this point in the history
  18. Merge tag 'pm-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/rafael/linux-pm
    
    Pull power management fix from Rafael Wysocki:
     "Fix the asymmetric CPU capacity support code in the intel_pstate
      driver, added during this develompent cycle, to address a corner case
      in which the capacity of a CPU going online is not updated (Rafael
      Wysocki)"
    
    * tag 'pm-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
      cpufreq: intel_pstate: Update asym capacity for CPUs that were offline initially
      cpufreq: intel_pstate: Clear hybrid_max_perf_cpu before driver registration
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    4f63642 View commit details
    Browse the repository at this point in the history
  19. Merge tag 'thermal-6.12-rc7' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/rafael/linux-pm
    
    Pull thermal control fixes from Rafael Wysocki:
     "These fix one issue in the qcom lmh thermal driver, a DT handling
      issue in the thermal core and two issues in the userspace thermal
      library:
    
       - Allow tripless thermal zones defined in a DT to be registered in
         accordance with the thermal DT bindings (Icenowy Zheng)
    
       - Annotate LMH IRQs with lockdep classes to prevent lockdep from
         reporting a possible recursive locking issue that cannot really
         occur (Dmitry Baryshkov)
    
       - Improve the thermal library "make clean" to remove a leftover
         symbolic link created during compilation and fix the sampling
         handler invocation in that library to pass the correct pointer to
         it (Emil Dahl Juhl, zhang jiao)"
    
    * tag 'thermal-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
      thermal/of: support thermal zones w/o trips subnode
      tools/lib/thermal: Remove the thermal.h soft link when doing make clean
      tools/lib/thermal: Fix sampling handler context ptr
      thermal/drivers/qcom/lmh: Remove false lockdep backtrace
    torvalds committed Nov 8, 2024
    Configuration menu
    Copy the full SHA
    da4373f View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2024

  1. Merge tag 'block-6.12-20241108' of git://git.kernel.dk/linux

    Pull block fix from Jens Axboe:
     "Single fix for an issue triggered with PROVE_RCU=y, with nvme using
      the wrong iterators for an SRCU protected list"
    
    * tag 'block-6.12-20241108' of git://git.kernel.dk/linux:
      nvme/host: Fix RCU list traversal to use SRCU primitive
    torvalds committed Nov 9, 2024
    Configuration menu
    Copy the full SHA
    a58f4dd View commit details
    Browse the repository at this point in the history
  2. Merge tag 'v6.12-rc6-smb3-client-fix' of git://git.samba.org/sfrench/…

    …cifs-2.6
    
    Pull smb client fix from Steve French:
     "Fix net namespace refcount use after free issue"
    
    * tag 'v6.12-rc6-smb3-client-fix' of git://git.samba.org/sfrench/cifs-2.6:
      smb: client: Fix use-after-free of network namespace.
    torvalds committed Nov 9, 2024
    Configuration menu
    Copy the full SHA
    bceea66 View commit details
    Browse the repository at this point in the history
  3. Merge tag 'nfsd-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/…

    …git/cel/linux
    
    Pull nfsd fix from Chuck Lever:
    
     - Fix a v6.12-rc regression when exporting ext4 filesystems with NFSD
    
    * tag 'nfsd-6.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
      NFSD: Fix READDIR on NFSv3 mounts of ext4 exports
    torvalds committed Nov 9, 2024
    Configuration menu
    Copy the full SHA
    de2f378 View commit details
    Browse the repository at this point in the history
  4. Merge tag 'i2c-host-fixes-6.12-rc7' of git://git.kernel.org/pub/scm/l…

    …inux/kernel/git/andi.shyti/linux into i2c/for-current
    
    i2c-host fixes for v6.12-rc7
    
    In designware an incorrect behavior has been fixes when
    concluding a transmission.
    
    Fixed return error value evaluation in the Mule multiplexer.
    Wolfram Sang committed Nov 9, 2024
    Configuration menu
    Copy the full SHA
    547aad9 View commit details
    Browse the repository at this point in the history

Commits on Nov 10, 2024

  1. Merge tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/gregkh/staging
    
    Pull staging driver fixes from Greg KH:
     "Here are two small memory leak fixes for the vchiq_arm staging driver
      that have been sitting in my tree for weeks and should get merged for
      6.12-rc7 so that people don't keep tripping over them.
    
      They both have been in linux-next for a while with no reported
      problems"
    
    * tag 'staging-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
      staging: vchiq_arm: Use devm_kzalloc() for drv_mgmt allocation
      staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation
    torvalds committed Nov 10, 2024
    Configuration menu
    Copy the full SHA
    023d4fc View commit details
    Browse the repository at this point in the history
  2. Merge tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/gregkh/usb
    
    Pull USB/Thunderbolt fixes from Greg KH:
     "Here are some small remaining USB and Thunderbolt fixes and device ids
      for 6.12-rc7. Included in here are:
    
       - new USB serial driver device ids
    
       - thunderbolt driver fixes for reported problems
    
       - typec bugfixes
    
       - dwc3 driver fix
    
       - musb driver fix
    
      All of these have been in linux-next this past week with no reported
      issues"
    
    * tag 'usb-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
      USB: serial: qcserial: add support for Sierra Wireless EM86xx
      thunderbolt: Fix connection issue with Pluggable UD-4VPD dock
      usb: typec: fix potential out of bounds in ucsi_ccg_update_set_new_cam_cmd()
      usb: dwc3: fix fault at system suspend if device was already runtime suspended
      usb: typec: qcom-pmic: init value of hdr_len/txbuf_len earlier
      usb: musb: sunxi: Fix accessing an released usb phy
      USB: serial: io_edgeport: fix use after free in debug printk
      USB: serial: option: add Quectel RG650V
      USB: serial: option: add Fibocom FG132 0x0112 composition
      thunderbolt: Add only on-board retimers when !CONFIG_USB4_DEBUGFS_MARGINING
    torvalds committed Nov 10, 2024
    Configuration menu
    Copy the full SHA
    a558cc3 View commit details
    Browse the repository at this point in the history
  3. Merge tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.o…

    …rg/pub/scm/linux/kernel/git/akpm/mm
    
    Pull misc fixes from Andrew Morton:
     "20 hotfixes, 14 of which are cc:stable.
    
      Three affect DAMON. Lorenzo's five-patch series to address the
      mmap_region error handling is here also.
    
      Apart from that, various singletons"
    
    * tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
      mailmap: add entry for Thorsten Blum
      ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
      signal: restore the override_rlimit logic
      fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
      ucounts: fix counter leak in inc_rlimit_get_ucounts()
      selftests: hugetlb_dio: check for initial conditions to skip in the start
      mm: fix docs for the kernel parameter ``thp_anon=``
      mm/damon/core: avoid overflow in damon_feed_loop_next_input()
      mm/damon/core: handle zero schemes apply interval
      mm/damon/core: handle zero {aggregation,ops_update} intervals
      mm/mlock: set the correct prev on failure
      objpool: fix to make percpu slot allocation more robust
      mm/page_alloc: keep track of free highatomic
      mm: resolve faulty mmap_region() error path behaviour
      mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
      mm: refactor map_deny_write_exec()
      mm: unconditionally close VMAs on error
      mm: avoid unsafe VMA hook invocation when error arises on mmap hook
      mm/thp: fix deferred split unqueue naming and locking
      mm/thp: fix deferred split queue not partially_mapped
    torvalds committed Nov 10, 2024
    Configuration menu
    Copy the full SHA
    28e4319 View commit details
    Browse the repository at this point in the history
  4. Merge tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/…

    …linux/kernel/git/tip/tip
    
    Pull irq fix from Borislav Petkov:
    
     - Make sure GICv3 controller interrupt activation doesn't race with a
       concurrent deactivation due to propagation delays of the register
       write
    
    * tag 'irq_urgent_for_v6.12_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
      irqchip/gic-v3: Force propagation of the active state with a read-back
    torvalds committed Nov 10, 2024
    Configuration menu
    Copy the full SHA
    a9cda7c View commit details
    Browse the repository at this point in the history
  5. filemap: Fix bounds checking in filemap_read()

    If the caller supplies an iocb->ki_pos value that is close to the
    filesystem upper limit, and an iterator with a count that causes us to
    overflow that limit, then filemap_read() enters an infinite loop.
    
    This behaviour was discovered when testing xfstests generic/525 with the
    "localio" optimisation for loopback NFS mounts.
    
    Reported-by: Mike Snitzer <[email protected]>
    Fixes: c2a9737 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
    Tested-by: Mike Snitzer <[email protected]>
    Signed-off-by: Trond Myklebust <[email protected]>
    Signed-off-by: Linus Torvalds <[email protected]>
    Trond Myklebust authored and torvalds committed Nov 10, 2024
    Configuration menu
    Copy the full SHA
    ace149e View commit details
    Browse the repository at this point in the history
  6. Merge tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/ke…

    …rnel/git/wsa/linux
    
    Pull i2c fixes from Wolfram Sang:
     "i2c-host fixes for v6.12-rc7 (from Andi):
    
       - Fix designware incorrect behavior when concluding a transmission
    
       - Fix Mule multiplexer error value evaluation"
    
    * tag 'i2c-for-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
      i2c: designware: do not hold SCL low when I2C_DYNAMIC_TAR_UPDATE is not set
      i2c: muxes: Fix return value check in mule_i2c_mux_probe()
    torvalds committed Nov 10, 2024
    Configuration menu
    Copy the full SHA
    d7e67a9 View commit details
    Browse the repository at this point in the history
  7. Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux…

    …/kernel/git/clk/linux
    
    Pull clk fixes from Stephen Boyd:
     "A handful of Qualcomm clk driver fixes:
    
       - Correct flags for X Elite USB MP GDSC and pcie pipediv2 clocks
    
       - Fix alpha PLL post_div mask for the cases where width is not
         specified
    
       - Avoid hangs in the SM8350 video driver (venus) by setting HW_CTRL
         trigger feature on the video clocks"
    
    * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
      clk: qcom: gcc-x1e80100: Fix USB MP SS1 PHY GDSC pwrsts flags
      clk: qcom: gcc-x1e80100: Fix halt_check for pipediv2 clocks
      clk: qcom: clk-alpha-pll: Fix pll post div mask when width is not set
      clk: qcom: videocc-sm8350: use HW_CTRL_TRIGGER for vcodec GDSCs
    torvalds committed Nov 10, 2024
    Configuration menu
    Copy the full SHA
    541f3d8 View commit details
    Browse the repository at this point in the history
  8. Linux 6.12-rc7

    torvalds committed Nov 10, 2024
    Configuration menu
    Copy the full SHA
    2d5404c View commit details
    Browse the repository at this point in the history

Commits on Nov 11, 2024

  1. btrfs: don't take dev_replace rwsem on task already holding it

    Running fstests btrfs/011 with MKFS_OPTIONS="-O rst" to force the usage of
    the RAID stripe-tree, we get the following splat from lockdep:
    
     BTRFS info (device sdd): dev_replace from /dev/sdd (devid 1) to /dev/sdb started
    
     ============================================
     WARNING: possible recursive locking detected
     6.11.0-rc3-btrfs-for-next torvalds#599 Not tainted
     --------------------------------------------
     btrfs/2326 is trying to acquire lock:
     ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
    
     but task is already holding lock:
     ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
    
     other info that might help us debug this:
      Possible unsafe locking scenario:
    
            CPU0
            ----
       lock(&fs_info->dev_replace.rwsem);
       lock(&fs_info->dev_replace.rwsem);
    
      *** DEADLOCK ***
    
      May be due to missing lock nesting notation
    
     1 lock held by btrfs/2326:
      #0: ffff88810f215c98 (&fs_info->dev_replace.rwsem){++++}-{3:3}, at: btrfs_map_block+0x39f/0x2250
    
     stack backtrace:
     CPU: 1 UID: 0 PID: 2326 Comm: btrfs Not tainted 6.11.0-rc3-btrfs-for-next torvalds#599
     Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
     Call Trace:
      <TASK>
      dump_stack_lvl+0x5b/0x80
      __lock_acquire+0x2798/0x69d0
      ? __pfx___lock_acquire+0x10/0x10
      ? __pfx___lock_acquire+0x10/0x10
      lock_acquire+0x19d/0x4a0
      ? btrfs_map_block+0x39f/0x2250
      ? __pfx_lock_acquire+0x10/0x10
      ? find_held_lock+0x2d/0x110
      ? lock_is_held_type+0x8f/0x100
      down_read+0x8e/0x440
      ? btrfs_map_block+0x39f/0x2250
      ? __pfx_down_read+0x10/0x10
      ? do_raw_read_unlock+0x44/0x70
      ? _raw_read_unlock+0x23/0x40
      btrfs_map_block+0x39f/0x2250
      ? btrfs_dev_replace_by_ioctl+0xd69/0x1d00
      ? btrfs_bio_counter_inc_blocked+0xd9/0x2e0
      ? __kasan_slab_alloc+0x6e/0x70
      ? __pfx_btrfs_map_block+0x10/0x10
      ? __pfx_btrfs_bio_counter_inc_blocked+0x10/0x10
      ? kmem_cache_alloc_noprof+0x1f2/0x300
      ? mempool_alloc_noprof+0xed/0x2b0
      btrfs_submit_chunk+0x28d/0x17e0
      ? __pfx_btrfs_submit_chunk+0x10/0x10
      ? bvec_alloc+0xd7/0x1b0
      ? bio_add_folio+0x171/0x270
      ? __pfx_bio_add_folio+0x10/0x10
      ? __kasan_check_read+0x20/0x20
      btrfs_submit_bio+0x37/0x80
      read_extent_buffer_pages+0x3df/0x6c0
      btrfs_read_extent_buffer+0x13e/0x5f0
      read_tree_block+0x81/0xe0
      read_block_for_search+0x4bd/0x7a0
      ? __pfx_read_block_for_search+0x10/0x10
      btrfs_search_slot+0x78d/0x2720
      ? __pfx_btrfs_search_slot+0x10/0x10
      ? lock_is_held_type+0x8f/0x100
      ? kasan_save_track+0x14/0x30
      ? __kasan_slab_alloc+0x6e/0x70
      ? kmem_cache_alloc_noprof+0x1f2/0x300
      btrfs_get_raid_extent_offset+0x181/0x820
      ? __pfx_lock_acquire+0x10/0x10
      ? __pfx_btrfs_get_raid_extent_offset+0x10/0x10
      ? down_read+0x194/0x440
      ? __pfx_down_read+0x10/0x10
      ? do_raw_read_unlock+0x44/0x70
      ? _raw_read_unlock+0x23/0x40
      btrfs_map_block+0x5b5/0x2250
      ? __pfx_btrfs_map_block+0x10/0x10
      scrub_submit_initial_read+0x8fe/0x11b0
      ? __pfx_scrub_submit_initial_read+0x10/0x10
      submit_initial_group_read+0x161/0x3a0
      ? lock_release+0x20e/0x710
      ? __pfx_submit_initial_group_read+0x10/0x10
      ? __pfx_lock_release+0x10/0x10
      scrub_simple_mirror.isra.0+0x3eb/0x580
      scrub_stripe+0xe4d/0x1440
      ? lock_release+0x20e/0x710
      ? __pfx_scrub_stripe+0x10/0x10
      ? __pfx_lock_release+0x10/0x10
      ? do_raw_read_unlock+0x44/0x70
      ? _raw_read_unlock+0x23/0x40
      scrub_chunk+0x257/0x4a0
      scrub_enumerate_chunks+0x64c/0xf70
      ? __mutex_unlock_slowpath+0x147/0x5f0
      ? __pfx_scrub_enumerate_chunks+0x10/0x10
      ? bit_wait_timeout+0xb0/0x170
      ? __up_read+0x189/0x700
      ? scrub_workers_get+0x231/0x300
      ? up_write+0x490/0x4f0
      btrfs_scrub_dev+0x52e/0xcd0
      ? create_pending_snapshots+0x230/0x250
      ? __pfx_btrfs_scrub_dev+0x10/0x10
      btrfs_dev_replace_by_ioctl+0xd69/0x1d00
      ? lock_acquire+0x19d/0x4a0
      ? __pfx_btrfs_dev_replace_by_ioctl+0x10/0x10
      ? lock_release+0x20e/0x710
      ? btrfs_ioctl+0xa09/0x74f0
      ? __pfx_lock_release+0x10/0x10
      ? do_raw_spin_lock+0x11e/0x240
      ? __pfx_do_raw_spin_lock+0x10/0x10
      btrfs_ioctl+0xa14/0x74f0
      ? lock_acquire+0x19d/0x4a0
      ? find_held_lock+0x2d/0x110
      ? __pfx_btrfs_ioctl+0x10/0x10
      ? lock_release+0x20e/0x710
      ? do_sigaction+0x3f0/0x860
      ? __pfx_do_vfs_ioctl+0x10/0x10
      ? do_raw_spin_lock+0x11e/0x240
      ? lockdep_hardirqs_on_prepare+0x270/0x3e0
      ? _raw_spin_unlock_irq+0x28/0x50
      ? do_sigaction+0x3f0/0x860
      ? __pfx_do_sigaction+0x10/0x10
      ? __x64_sys_rt_sigaction+0x18e/0x1e0
      ? __pfx___x64_sys_rt_sigaction+0x10/0x10
      ? __x64_sys_close+0x7c/0xd0
      __x64_sys_ioctl+0x137/0x190
      do_syscall_64+0x71/0x140
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
     RIP: 0033:0x7f0bd1114f9b
     Code: Unable to access opcode bytes at 0x7f0bd1114f71.
     RSP: 002b:00007ffc8a8c3130 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
     RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f0bd1114f9b
     RDX: 00007ffc8a8c35e0 RSI: 00000000ca289435 RDI: 0000000000000003
     RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000007
     R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffc8a8c6c85
     R13: 00000000398e72a0 R14: 0000000000004361 R15: 0000000000000004
      </TASK>
    
    This happens because on RAID stripe-tree filesystems we recurse back into
    btrfs_map_block() on scrub to perform the logical to device physical
    mapping.
    
    But as the device replace task is already holding the dev_replace::rwsem
    we deadlock.
    
    So don't take the dev_replace::rwsem in case our task is the task performing
    the device replace.
    
    Suggested-by: Filipe Manana <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    8cca35c View commit details
    Browse the repository at this point in the history
  2. btrfs: make assert_rbio() to only check CONFIG_BTRFS_ASSERT

    According to the description, CONFIG_BTRFS_DEBUG is only for extra
    debug info, meanwhile sanity checks should be managed by
    CONFIG_BTRFS_ASSERT.
    
    There is no need to check both to enable assert_rbio().
    
    Just remove the check for CONFIG_BTRFS_DEBUG.
    
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    c186345 View commit details
    Browse the repository at this point in the history
  3. btrfs: split out CONFIG_BTRFS_EXPERIMENTAL from CONFIG_BTRFS_DEBUG

    Currently CONFIG_BTRFS_EXPERIMENTAL is not only for the extra debugging
    output, but also for experimental features.
    
    This is not ideal to distinguish planned but not yet stable features
    from those purely designed for debugging.
    
    This patch splits the following features into CONFIG_BTRFS_EXPERIMENTAL:
    
    - Extent map shrinker
      This seems to be the first one to exit experimental.
    
    - Extent tree v2
      This seems to be the last one to graduate from experimental.
    
    - Raid stripe tree
    - Csum offload mode
    - Send protocol v3
    
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    67cd3f2 View commit details
    Browse the repository at this point in the history
  4. btrfs: zlib: make the compression path to handle sector size < page size

    Inside zlib_compress_folios(), each time we switch the input page cache,
    the @start is increased by PAGE_SIZE.
    
    But for the incoming compression support for sector size < page size
    (previously we support compression only when the range is fully page
    aligned), this is not going to handle the following case:
    
        0          32K         64K          96K
        |          |///////////||///////////|
    
    @start has the initial value 32K, indicating the start filepos of the
    to-be-compressed range.
    
    And when grabbing the first page as input, we always call "start +=
    PAGE_SIZE;".
    
    But since @start is starting at 32K, it will be increased by 64K,
    resulting it to be 96K for the next range, causing incorrect input range
    and corruption for the future subpage compression.
    
    Fix it by only increase @start by the input size.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    f6ebedb View commit details
    Browse the repository at this point in the history
  5. btrfs: zstd: make the compression path to handle sector size < page size

    Inside zstd_compress_folios(), after exhausted one input page, we need
    to switch to the next page as input.
    
    However when counting the total input bytes (@tot_in), we always increase
    it by PAGE_SIZE.
    
    For the following case, it can cause incorrect value:
    
            0          32K         64K          96K
            |          |///////////||///////////|
    
    After compressing range [32K, 64K), we switch to the next page, and
    increasing @tot_in by 64K, while we only read 32K.
    
    This will cause the @total_in to return a value larger than the input
    length.
    
    Fix it by only increase @tot_in by the input size.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    90275a7 View commit details
    Browse the repository at this point in the history
  6. btrfs: compression: add an ASSERT() to ensure the read-in length is sane

    There are already two bugs (one in zlib, one in zstd) that involved
    compression path is not handling sector size < page size cases well.
    
    So it makes more sense to make sure that btrfs_compress_folios() returns
    
    Since we already have two bugs (one in zlib, one in zstd) in the
    compression path resulting the @total_in be to larger than the
    to-be-compressed range length, there is enough reason to add an ASSERT()
    to make sure the total read-in length doesn't exceed the input length.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    dd5e276 View commit details
    Browse the repository at this point in the history
  7. btrfs: wait for writeback if sector size is smaller than page size

    [PROBLEM]
    If sector perfect compression is enabled for sector size < page size
    case, the following case can lead dirty ranges not being written back:
    
         0     32K     64K     96K     128K
         |     |///////||//////|     |/|
                                     124K
    
    In above example, the page size is 64K, and we need to write back above
    two pages.
    
    - Submit for page 0 (main thread)
      We found delalloc range [32K, 96K), which can be compressed.
      So we queue an async range for [32K, 96K).
      This means, the page unlock/clearing dirty/setting writeback will
      all happen in a workqueue context.
    
    - The compression is done, and compressed range is submitted (workqueue)
      Since the compression is done in asynchronously, the compression can
      be done before the main thread to submit for page 64K.
    
      Now the whole range [32K, 96K), involving two pages, will be marked
      writeback.
    
    - Submit for page 64K (main thread)
      extent_write_cache_pages() got its wbc->sync_mode is WB_SYNC_NONE,
      so it skips the writeback wait.
    
      And unlock the page and exit. This means the dirty range [124K, 128K)
      will never be submitted, until next writeback happens for page 64K.
    
    This will never happen for previous kernels because:
    
    - For sector size == page size case
      Since one page is one sector, if a page is marked writeback it will
      not have dirty flags.
      So this corner case will never hit.
    
    - For sector size < page size case
      We never do subpage compression, a range can only be submitted for
      compression if the range is fully page aligned.
      This change makes the subpage behavior mostly the same as non-subpage
      cases.
    
    [ENHANCEMENT]
    Instead of relying WB_SYNC_NONE check only, if it's a subpage case, then
    always wait for writeback flags.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    a8706d0 View commit details
    Browse the repository at this point in the history
  8. btrfs: make extent_range_clear_dirty_for_io() to handle sector size <…

    … page size cases
    
    For btrfs with sector size < page size (e.g. 4K sector size, 64K page
    size), and enable the sector perfect compression support, then the
    following dirty range can lead to problems:
    
       0     32K     64K     96K    128K
       |     |///////||//////|    |/|
                                  124K
    
    In above case, if we start writeback for that inode, the last dirty
    range [124K, 128K) will not be submitted and cause reserved space
    leakage:
    
    - Start writeback for page 0
      We find the range [32K, 96K) is suitable for compression, and queue it
      into a workqueue to do the delayed compression and submission.
    
    - Compression happens for range [32K, 96K)
      Function extent_range_clear_dirty_for_io() is called, however it is
      only doing full page handling, not considering any the extra bitmaps
      for subpage cases.
    
      That function will clear page dirty for both page 0 and page 64K.
    
    - Writeback for the inode is done
      Because page 64K has its dirty flag cleared, it will not be considered
      as a writeback target.
    
    This means the range [124K, 128K) will not be submitted, and reserved
    space for it will be leaked.
    
    Fix this problem by using the subpage helper to clear the dirty flag.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    a4ef54d View commit details
    Browse the repository at this point in the history
  9. btrfs: do not assume the full page range is not dirty in extent_write…

    …page_io()
    
    The function extent_writepage_io() will submit the dirty sectors inside
    the page for the write.
    
    But recently to co-operate with the incoming subpage compression
    enhancement, a new bitmap is introduced to
    btrfs_bio_ctrl::submit_bitmap, to only avoid a subset of the dirty
    range.
    
    This is because we can have the following cases with 64K page size:
    
        0      16K       32K       48K       64K
        |      |/////////|         |/|
                                     52K
    
    For range [16K, 32K), we queue the dirty range for compression, which is
    ran in a delayed workqueue.
    Then for range [48K, 52K), we go through the regular submission path.
    
    In that case, our btrfs_bio_ctrl::submit_bitmap will exclude the range
    [16K, 32K).
    
    The dirty flags for the range [16K, 32K) is only cleared when the
    compression is done, by the extent_clear_unlock_delalloc() call inside
    submit_one_async_extent().
    
    This patch fix the false alert by removing the
    btrfs_folio_assert_not_dirty() check, since it's no longer correct for
    subpage compression cases.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    928b4de View commit details
    Browse the repository at this point in the history
  10. btrfs: move the delalloc range bitmap search into extent_io.c

    Currently for subpage (sector size < page size) cases, we reuse subpage
    locked bitmap to find out all delalloc ranges we have locked, and run
    all those found ranges.
    
    However such reuse is not perfect, e.g.:
    
        0       32K      64K      96K       128K
        |       |////////||///////|    |////|
                                       120K
    
    For above range, writepage_delalloc() for page 0 will handle the range
    [32K, 96k), note delalloc range can be beyond the page boundary.
    
    But writepage_delalloc() for page 64K will only handle range [120K,
    128K), as the previous run on page 0 has already handled range [64K,
    96K).
    Meanwhile for the writeback we should expect range [64K, 96K) to also be
    locked, this leads to the mismatch from locked bitmap and delalloc
    range.
    
    This is not causing problems yet, but it's still an inconsistent
    behavior.
    
    So instead of relying on the subpage locked bitmap, move the delalloc
    range search using local @delalloc_bitmap, so that we can remove the
    existing btrfs_folio_find_writer_locked().
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2bca8eb View commit details
    Browse the repository at this point in the history
  11. btrfs: mark all dirty sectors as locked inside writepage_delalloc()

    Currently we only mark sectors as locked if there is a *NEW* delalloc
    range for it.
    
    But NEW delalloc range is not the same as dirty sectors we want to
    submit, e.g:
    
            0       32K      64K      96K       128K
            |       |////////||///////|    |////|
                                           120K
    
    For above 64K page size case, writepage_delalloc() for page 0 will find
    and lock the delalloc range [32K, 96K), which is beyond the page
    boundary.
    
    Then when writepage_delalloc() is called for the page 64K, since [64K,
    96K) is already locked, only [120K, 128K) will be locked.
    
    This means, although range [64K, 96K) is dirty and will be submitted
    later by extent_writepage_io(), it will not be marked as locked.
    
    This is fine for now, as we call btrfs_folio_end_writer_lock_bitmap() to
    free every non-compressed sector, and compression is only allowed for
    full page range.
    
    But this is not safe for future sector perfect compression support, as
    this can lead to double folio unlock:
    
                  Thread A                 |           Thread B
    ---------------------------------------+--------------------------------
                                           | submit_one_async_extent()
    				       | |- extent_clear_unlock_delalloc()
    extent_writepage()                     |    |- btrfs_folio_end_writer_lock()
    |- btrfs_folio_end_writer_lock_bitmap()|       |- btrfs_subpage_end_and_test_writer()
       |                                   |       |  |- atomic_sub_and_test()
       |                                   |       |     /* Now the atomic value is 0 */
       |- if (atomic_read() == 0)          |       |
       |- folio_unlock()                   |       |- folio_unlock()
    
    The root cause is the above range [64K, 96K) is dirtied and should also
    be locked but it isn't.
    
    So to make everything more consistent and prepare for the incoming
    sector perfect compression, mark all dirty sectors as locked.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    c96d0e3 View commit details
    Browse the repository at this point in the history
  12. btrfs: allow compression even if the range is not page aligned

    Previously for btrfs with sector size smaller than page size (subpage),
    we only allow compression if the range is fully page aligned.
    
    This is to work around the asynchronous submission of compressed range,
    which delayed the page unlock and writeback into a workqueue,
    furthermore asynchronous submission can lock multiple sector range
    across page boundary.
    
    Such asynchronous submission makes it very hard to co-operate with other
    regular writes.
    
    With the recent changes to the subpage folio unlock path, now
    asynchronous submission of compressed pages can co-operate with regular
    submission, so enable sector perfect compression if it's an experimental
    build.
    
    The ETA for moving this feature out of experimental is 6.15, and I hope
    all remaining corner cases can be exposed before that.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    1d2fbb7 View commit details
    Browse the repository at this point in the history
  13. btrfs: avoid unnecessary device path update for the same device

    [PROBLEM]
    It is very common for udev to trigger device scan, and every time a
    mounted btrfs device got re-scan from different soft links, we will get
    some of unnecessary device path updates, this is especially common
    for LVM based storage:
    
     # lvs
      scratch1 test -wi-ao---- 10.00g
      scratch2 test -wi-a----- 10.00g
      scratch3 test -wi-a----- 10.00g
      scratch4 test -wi-a----- 10.00g
      scratch5 test -wi-a----- 10.00g
      test     test -wi-a----- 10.00g
    
     # mkfs.btrfs -f /dev/test/scratch1
     # mount /dev/test/scratch1 /mnt/btrfs
     # dmesg -c
     [  205.705234] BTRFS: device fsid 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9 devid 1 transid 6 /dev/mapper/test-scratch1 (253:4) scanned by mount (1154)
     [  205.710864] BTRFS info (device dm-4): first mount of filesystem 7be2602f-9e35-4ecf-a6ff-9e91d2c182c9
     [  205.711923] BTRFS info (device dm-4): using crc32c (crc32c-intel) checksum algorithm
     [  205.713856] BTRFS info (device dm-4): using free-space-tree
     [  205.722324] BTRFS info (device dm-4): checking UUID tree
    
    So far so good, but even if we just touched any soft link of
    "dm-4", we will get quite some unnecessary device path updates.
    
     # touch /dev/mapper/test-scratch1
     # dmesg -c
     [  469.295796] BTRFS info: devid 1 device path /dev/mapper/test-scratch1 changed to /dev/dm-4 scanned by (udev-worker) (1221)
     [  469.300494] BTRFS info: devid 1 device path /dev/dm-4 changed to /dev/mapper/test-scratch1 scanned by (udev-worker) (1221)
    
    Such device path rename is unnecessary and can lead to random path
    change due to the udev race.
    
    [CAUSE]
    Inside device_list_add(), we are using a very primitive way checking if
    the device has changed, strcmp().
    
    Which can never handle links well, no matter if it's hard or soft links.
    
    So every different link of the same device will be treated as a different
    device, causing the unnecessary device path update.
    
    [FIX]
    Introduce a helper, is_same_device(), and use path_equal() to properly
    detect the same block device.
    So that the different soft links won't trigger the rename race.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641
    Reported-by: Fabian Vogt <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2e8b6bc View commit details
    Browse the repository at this point in the history
  14. btrfs: canonicalize the device path before adding it

    [PROBLEM]
    Currently btrfs accepts any file path for its device, resulting some
    weird situation:
    
     # ./mount_by_fd /dev/test/scratch1  /mnt/btrfs/
    
    The program has the following source code:
    
     #include <fcntl.h>
     #include <stdio.h>
     #include <sys/mount.h>
    
     int main(int argc, char *argv[]) {
    	int fd = open(argv[1], O_RDWR);
    	char path[256];
    	snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
    	return mount(path, argv[2], "btrfs", 0, NULL);
     }
    
    Then we can have the following weird device path:
    
     BTRFS: device fsid 2378be81-fe12-46d2-a9e8-68cf08dd98d5 devid 1 transid 7 /proc/self/fd/3 (253:2) scanned by mount_by_fd (18440)
    
    Normally it's not a big deal, and later udev can trigger a device path
    rename. But if udev didn't trigger, the device path "/proc/self/fd/3"
    will show up in mtab.
    
    [CAUSE]
    For filename "/proc/self/fd/3", it means the opened file descriptor 3.
    In above case, it's exactly the device we want to open, aka points to
    "/dev/test/scratch1" which is another symlink pointing to "/dev/dm-2".
    
    Inside kernel we solve the mount source using LOOKUP_FOLLOW, which
    follows the symbolic link and grab the proper block device.
    
    But inside btrfs we also save the filename into btrfs_device::name, and
    utilize that member to report our mount source, which leads to the above
    situation.
    
    [FIX]
    Instead of unconditionally trust the path, check if the original file
    (not following the symbolic link) is inside "/dev/", if not, then
    manually lookup the path to its final destination, and use that as our
    device path.
    
    This allows us to still use symbolic links, like
    "/dev/mapper/test-scratch" from LVM2, which is required for fstests runs
    with LVM2 setup.
    
    And for really weird names, like the above case, we solve it to
    "/dev/dm-2" instead.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Link: https://bugzilla.suse.com/show_bug.cgi?id=1230641
    Reported-by: Fabian Vogt <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    7e06de7 View commit details
    Browse the repository at this point in the history
  15. btrfs: remove code duplication in ordered extent finishing

    Remove the duplicated transaction joining, block reserve setting and raid
    extent inserting in btrfs_finish_ordered_extent().
    
    While at it, also abort the transaction in case inserting a RAID
    stripe-tree entry fails.
    
    Suggested-by: Naohiro Aota <[email protected]>
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2206265 View commit details
    Browse the repository at this point in the history
  16. btrfs: qgroups: remove bytenr field from struct btrfs_qgroup_extent_r…

    …ecord
    
    Now that we track qgroup extent records in a xarray we don't need to have
    a "bytenr" field in  struct btrfs_qgroup_extent_record, since we can get
    it from the index of the record in the xarray.
    
    So remove the field and grab the bytenr from either the index key or any
    other place where it's available (delayed refs). This reduces the size of
    struct btrfs_qgroup_extent_record from 40 bytes down to 32 bytes, meaning
    that we now can store 128 instances of this structure instead of 102 per
    4K page.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    c28b97f View commit details
    Browse the repository at this point in the history
  17. btrfs: store fs_info in a local variable at btrfs_qgroup_trace_extent…

    …_post()
    
    Instead of extracting fs_info from the transaction multiples times, store
    it in a local variable and use it.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    fad884b View commit details
    Browse the repository at this point in the history
  18. btrfs: remove unnecessary delayed refs locking at btrfs_qgroup_trace_…

    …extent()
    
    There's no need to hold the delayed refs spinlock when calling
    btrfs_qgroup_trace_extent_nolock() from btrfs_qgroup_trace_extent(), since
    it doesn't change anything in delayed refs and it only changes the xarray
    used to track qgroup extent records, which is protected by the xarray's
    lock.
    
    Holding the lock is only adding unnecessary lock contention with other
    tasks that actually need to take the lock to add/remove/change delayed
    references. So remove the locking.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    c5e2680 View commit details
    Browse the repository at this point in the history
  19. btrfs: always use delayed_refs local variable at btrfs_qgroup_trace_e…

    …xtent()
    
    Instead of dereferencing the delayed refs from the transaction multiple
    times, store it early in the local variable and then always use the
    variable.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    db58e15 View commit details
    Browse the repository at this point in the history
  20. btrfs: remove pointless initialization at btrfs_qgroup_trace_extent()

    The qgroup record was allocated with kzalloc(), so it's pointless to set
    its old_roots member to NULL. Remove the assignment.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    287d1cf View commit details
    Browse the repository at this point in the history
  21. btrfs: remove redundant stop_loop variable in scrub_stripe()

    The variable stop_loop was originally introduced in commit 625f1c8
    ("Btrfs: improve the loop of scrub_stripe"). It was initialized to 0 in
    commit 3b080b2 ("Btrfs: scrub raid56 stripes in the right way").
    However, in a later commit 18d30ab ("btrfs: scrub: use
    scrub_simple_mirror() to handle RAID56 data stripe scrub"), the code
    that modified stop_loop was removed, making the variable redundant.
    
    Currently, stop_loop is only initialized with 0 and is never used or
    modified within the scrub_stripe() function. As a result, this patch
    removes the stop_loop variable to clean up the code and eliminate
    unnecessary redundancy.
    
    This change has no impact on functionality, as stop_loop was never
    utilized in any meaningful way in the final version of the code.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Riyan Dhiman <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Ryand1234 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    522945b View commit details
    Browse the repository at this point in the history
  22. btrfs: remove unused page_to_inode and page_to_fs_info macros

    This macro is no longer used after the "btrfs: Cleaned up folio->page
    conversion" series patch [1] was applied, so remove it.
    
    [1]: https://patchwork.kernel.org/project/linux-btrfs/cover/[email protected]/
    
    Reviewed-by: Neal Gompa <[email protected]>
    Signed-off-by: Youling Tang <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Youling Tang authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    fa984c9 View commit details
    Browse the repository at this point in the history
  23. btrfs: correct typos in multiple comments across various files

    Fix some confusing spelling errors that were currently identified,
    the details are as follows:
    
    	block-group.c: 2800: 	uncompressible 	==> incompressible
    	extent-tree.c: 3131:	EXTEMT		==> EXTENT
    	extent_io.c: 3124: 	utlizing 	==> utilizing
    	extent_map.c: 1323: 	ealier		==> earlier
    	extent_map.c: 1325:	possiblity	==> possibility
    	fiemap.c: 189:		emmitted	==> emitted
    	fiemap.c: 197:		emmitted	==> emitted
    	fiemap.c: 203:		emmitted	==> emitted
    	transaction.h: 36:	trasaction	==> transaction
    	volumes.c: 5312:	filesysmte	==> filesystem
    	zoned.c: 1977:		trasnsaction	==> transaction
    
    Signed-off-by: Shen Lichuan <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Shen Lichuan authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2144e1f View commit details
    Browse the repository at this point in the history
  24. btrfs: tests: add selftests for raid-stripe-tree

    Add first stash of very basic self tests for the RAID stripe-tree.
    
    More test cases will follow exercising the tree.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    506be4d View commit details
    Browse the repository at this point in the history
  25. btrfs: remove unused btrfs_free_squota_rsv()

    btrfs_free_squota_rsv() was added in commit
    e85a0ad ("btrfs: ensure releasing squota reserve on head refs")
    but has remained unused since then.
    Remove it as we don't seem to need it and was probably a leftover.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Dr. David Alan Gilbert <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Dr. David Alan Gilbert authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    004641b View commit details
    Browse the repository at this point in the history
  26. btrfs: remove unused btrfs_is_parity_mirror()

    btrfs_is_parity_mirror() has been unused since commit 4886ff7
    ("btrfs: introduce a new helper to submit write bio for repair").
    Remove it as the code was refactored and we don't need the helper
    anymore.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Dr. David Alan Gilbert <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Dr. David Alan Gilbert authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    441ffe8 View commit details
    Browse the repository at this point in the history
  27. btrfs: remove unused btrfs_try_tree_write_lock()

    btrfs_try_tree_write_lock() has been unused since commit
    50b21d7 ("btrfs: submit a writeback bio per extent_buffer").
    Remove it as we don't need it anymore.
    
    Reviewed-by: Christoph Hellwig <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Dr. David Alan Gilbert <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Dr. David Alan Gilbert authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    b628c13 View commit details
    Browse the repository at this point in the history
  28. btrfs: remove the dirty_page local variable

    Inside btrfs_buffered_write(), we have a local variable @dirty_pages,
    recording the number of pages we dirtied in the current iteration.
    
    However we do not really need that variable, since it can be calculated
    from @pos and @copied.
    
    In fact there is already a problem inside the short copy path, where we
    use @dirty_pages to calculate the range we need to release.
    But that usage assumes sectorsize == PAGE_SIZE, which is no longer true.
    
    Instead of keeping @dirty_pages and cause incorrect usage, just
    calculate the number of dirtied pages inside btrfs_dirty_pages().
    
    Reviewed-by: Josef Bacik <[email protected]>
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    00c5135 View commit details
    Browse the repository at this point in the history
  29. btrfs: simplify the page uptodate preparation for prepare_pages()

    Currently inside prepare_pages(), we handle the leading and tailing page
    differently, and skip the middle pages (if any).  This is to avoid
    reading pages which are fully covered by the dirty range.
    
    Refactor the code by moving all checks (alignment check, range check,
    force read check) into prepare_uptodate_page().
    
    So that prepare_pages() only needs to iterate all the pages
    unconditionally.
    
    And since we're here, also update prepare_uptodate_page() to use
    folio API other than the old page API.
    
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    7f91c6a View commit details
    Browse the repository at this point in the history
  30. btrfs: handle empty list of NOCOW ordered extents with checksum list

    Currently we BUG_ON() in btrfs_finish_one_ordered() if we are finishing
    an ordered extent that is flagged as NOCOW, but it's checksum list is
    not empty.
    
    This is clearly a logic error which we can recover from by aborting the
    transaction.
    
    For developer builds which enable CONFIG_BTRFS_ASSERT, also ASSERT()
    that the list is empty.
    
    Suggested-by: Filipe Manana <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    61b4d75 View commit details
    Browse the repository at this point in the history
  31. btrfs: return ENODATA in case RST lookup fails

    In case a lookup in the RAID stripe-tree fails, return ENODATA instead of
    ENOENT to better distinguish stripe-tree lookups from other code paths
    where we return ENOENT.
    
    Suggested-by: Josef Bacik <[email protected]>
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    5e72aab View commit details
    Browse the repository at this point in the history
  32. btrfs: scrub: skip initial RST lookup errors

    Performing the initial extent sector read on a RAID stripe-tree backed
    filesystem with pre-allocated extents will cause the RAID stripe-tree
    lookup code to return ENODATA, as pre-allocated extents do not have any
    on-disk bytes and thus no RAID stripe-tree entries.
    
    But the current scrub read code marks these extents as errors, because
    the lookup fails.
    
    If btrfs_map_block() returns -ENODATA, it means that the call to
    btrfs_get_raid_extent_offset() returned -ENODATA, because there is no
    entry for the corresponding range in the RAID stripe-tree. But as this
    range is in the extent tree it means we've hit a pre-allocated extent. In
    this case, don't mark the sector in the stripe's error bitmaps as faulty
    and carry on to the next.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    9fde8a6 View commit details
    Browse the repository at this point in the history
  33. btrfs: qgroup: run delayed iputs after ordered extent completion

    When trying to flush qgroups in order to release space we run delayed
    iputs in order to release space from recently deleted files (their link
    counted reached zero), and then we start delalloc and wait for any
    existing ordered extents to complete.
    
    However there's a time window here where we end up not doing the final
    iput on a deleted file which could release necessary space:
    
    1) An unlink operation starts;
    
    2) During the unlink, or right before it completes, delalloc is flushed
       and an ordered extent is created;
    
    3) When the ordered extent is created, the inode's ref count is
       incremented (with igrab() at alloc_ordered_extent());
    
    4) When the unlink finishes it doesn't drop the last reference on the
       inode and so it doesn't trigger inode eviction to delete all of
       the inode's items in its root and drop all references on its data
       extents;
    
    5) Another task enters try_flush_qgroup() to try to release space,
       it runs all delayed iputs, but there's no delayed iput yet for that
       deleted file because the ordered extent hasn't completed yet;
    
    6) Then at try_flush_qgroup() we wait for the ordered extent to complete
       and that results in adding a delayed iput at btrfs_put_ordered_extent()
       when called from btrfs_finish_one_ordered();
    
    7) Adding the delayed iput results in waking the cleaner kthread if it's
       not running already. However it may take some time for it to be
       scheduled, or it may be running but busy running auto defrag, dropping
       deleted snapshots or doing other work, so by the time we return from
       try_flush_qgroup() the space for deleted file isn't released.
    
    Improve on this by running delayed iputs only after flushing delalloc
    and waiting for ordered extent completion.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    dd40283 View commit details
    Browse the repository at this point in the history
  34. btrfs: remove btrfs_set_range_writeback()

    The function btrfs_set_range_writeback() was originally a callback for
    metadata and data, to mark a range with writeback flag.
    
    Then it was converted into a common function call for both metadata and
    data.
    
    From the very beginning, the function had been only called on a full page,
    later converted to handle range inside a page.
    
    But it never needed to handle multiple pages, and since commit
    8189197 ("btrfs: refactor __extent_writepage_io() to do
    sector-by-sector submission") the function was only called on a
    sector-by-sector basis.
    
    This makes the function unnecessary, and can be converted to a simple
    btrfs_folio_set_writeback() call instead.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    0fcaf92 View commit details
    Browse the repository at this point in the history
  35. btrfs: zstd: assert the timer pointer in callback

    Make sure we got the right timer struct for the zstd workspace reclaim
    work.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2fac7e1 View commit details
    Browse the repository at this point in the history
  36. btrfs: drop unused parameter path from btrfs_tree_mod_log_rewind()

    The path parameter was used for our own locking, that got converted to
    rwsem eventually. Last usage in ac5887c ("btrfs: locking: remove
    all the blocking helpers").
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    25a1399 View commit details
    Browse the repository at this point in the history
  37. btrfs: drop unused parameter ctx from batch_delete_dir_index_items()

    The ctx parameter is not used, we can drop it.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2d5903d View commit details
    Browse the repository at this point in the history
  38. btrfs: drop unused parameter fs_info from wait_reserve_ticket()

    The parameter is not used, we can also reach it from the space info if
    needed in the future.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    a6563fa View commit details
    Browse the repository at this point in the history
  39. btrfs: drop unused parameter fs_info from do_reclaim_sweep()

    The parameter is unused and we can get it from space info if needed.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    343a635 View commit details
    Browse the repository at this point in the history
  40. btrfs: send: drop unused parameter num from iterate_inode_ref_t callb…

    …acks
    
    None of the ref iteration callbacks needs the num parameter (this is for
    the directory item iteration), so we can drop it.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    a1e76e3 View commit details
    Browse the repository at this point in the history
  41. btrfs: send: drop unused parameter index from iterate_inode_ref_t cal…

    …lbacks
    
    None of the ref iteration callbacks needs the index parameter (this is
    for the directory item iteration), so we can drop it.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    a86a735 View commit details
    Browse the repository at this point in the history
  42. btrfs: scrub: drop unused parameter sctx from scrub_submit_extent_sec…

    …tor_read()
    
    The parameter is unused and we can reach sctx from scrub stripe if
    needed.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    f2c144f View commit details
    Browse the repository at this point in the history
  43. btrfs: drop unused parameter map from scrub_simple_mirror()

    The parameter map used to be passed to scrub_extent() until
    e02ee89 ("btrfs: scrub: switch scrub_simple_mirror() to
    scrub_stripe infrastructure"), where the scrub implementation was
    completely reworked.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    887d417 View commit details
    Browse the repository at this point in the history
  44. btrfs: qgroup: drop unused parameter fs_info from __del_qgroup_rb()

    We don't need fs_info here, everything is reachable from qgroup.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2651f43 View commit details
    Browse the repository at this point in the history
  45. btrfs: drop unused transaction parameter from btrfs_qgroup_add_swappe…

    …d_blocks()
    
    The caller replace_path() runs under transaction but we don't need it in
    btrfs_qgroup_add_swapped_blocks().
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    d7f4b4e View commit details
    Browse the repository at this point in the history
  46. btrfs: lzo: drop unused paramter level from lzo_alloc_workspace()

    The LZO compression has only one level, we don't need to pass the
    parameter.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    3f4b1bc View commit details
    Browse the repository at this point in the history
  47. btrfs: drop unused parameter argp from btrfs_ioctl_quota_rescan_wait()

    We don't need the user passed parameter, rescan is a filesystem
    operation so fs_info is sufficient.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    fd68c60 View commit details
    Browse the repository at this point in the history
  48. btrfs: drop unused parameter inode from read_inline_extent()

    We don't need the inode pointer to read inline extent, it's all
    accessible from the path pointer.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    372e5f8 View commit details
    Browse the repository at this point in the history
  49. btrfs: drop unused parameter offset from __cow_file_range_inline()

    We don't need offset for inline extents, they always start from 0.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    e469da5 View commit details
    Browse the repository at this point in the history
  50. btrfs: drop unused parameter file_offset from btrfs_encoded_read_regu…

    …lar_fill_pages()
    
    The file_offset parameter used to be passed to encoded read struct but
    was removed in commit b665aff ("btrfs: remove unused members from
    struct btrfs_encoded_read_private").
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    590168e View commit details
    Browse the repository at this point in the history
  51. btrfs: drop unused parameter iov_iter from btrfs_write_check()

    The parameter 'from' has never been used since commit b8d8e1f
    ("btrfs: introduce btrfs_write_check()"), this is for buffered write.
    Direct io write needs it so it was probably an interface thing, but we
    can drop it.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    cc5fe81 View commit details
    Browse the repository at this point in the history
  52. btrfs: drop unused parameter refs from visit_node_for_delete()

    The parameter duplicates what can be effectively obtained from
    wc->refs[level - 1] and this is what's actually used inside. Added in
    commit 2b73c7e ("btrfs: unify logic to decide if we need to walk
    down into a node during snapshot delete").
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    f8c4d59 View commit details
    Browse the repository at this point in the history
  53. btrfs: drop unused parameter mask from try_release_extent_state()

    The mask parameter used for allocations got unified to GFP_NOFS and
    removed from relevant functions in 1d12680 ("btrfs: drop gfp from
    parameter extent state helpers").
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2decc28 View commit details
    Browse the repository at this point in the history
  54. btrfs: drop unused parameter fs_info from folio_range_has_eb()

    The parameter was added in 8ff8466 ("btrfs: support subpage for
    extent buffer page release") for page but hasn't been used since, so we
    can drop it.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    ec315b4 View commit details
    Browse the repository at this point in the history
  55. btrfs: drop unused parameter options from open_ctree()

    Since the new mount option parser in commit ad21f15 ("btrfs:
    switch to the new mount API") we don't pass the options like that
    anymore.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    87cbab8 View commit details
    Browse the repository at this point in the history
  56. btrfs: drop unused parameter data from btrfs_fill_super()

    The only caller passes NULL, we can drop the parameter. This is since
    the new mount option parser done in 3bb17a2 ("btrfs: add get_tree
    callback for new mount API").
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    01c5db7 View commit details
    Browse the repository at this point in the history
  57. btrfs: drop unused parameter transaction from alloc_log_tree()

    The function got split in commit 6ab6ebb ("btrfs: split
    alloc_log_tree()") and since then transaction parameter has been unused.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    d12a1a2 View commit details
    Browse the repository at this point in the history
  58. btrfs: drop unused parameter fs_info from btrfs_match_dir_item_name()

    Cascaded removal of fs_info that is not needed in several functions.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    8c7cd2b View commit details
    Browse the repository at this point in the history
  59. btrfs: drop unused parameter level from alloc_heuristic_ws()

    The compression heuristic pass does not need a level, so we can drop the
    parameter.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    a9c50c9 View commit details
    Browse the repository at this point in the history
  60. btrfs: reduce lock contention when eb cache miss for btree search

    When crawling btree, if an eb cache miss occurs, we change to use the eb
    read lock and release all previous locks (including the parent lock) to
    reduce lock contention.
    
    If an eb cache miss occurs in a leaf and needs to execute IO, before this
    change we released locks only from level 2 and up and we read a leaf's
    content from disk while holding a lock on its parent (level 1), causing
    the unnecessary lock contention on the parent, after this change we
    release locks from level 1 and up, but we lock level 0, and read leaf's
    content from disk.
    
    Because we have prepared the check parameters and the read lock of eb we
    hold, we can ensure that no race will occur during the check and cause
    unexpected errors.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Robbie Ko <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Robbie Ko authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    9978599 View commit details
    Browse the repository at this point in the history
  61. btrfs: add and use helper to remove extent map from its inode's tree

    Move the common code to remove an extent map from its inode's tree into a
    helper function and use it, reducing duplicated code.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    03ba050 View commit details
    Browse the repository at this point in the history
  62. btrfs: make the extent map shrinker run asynchronously as a work queu…

    …e job
    
    Currently the extent map shrinker is run synchronously for kswapd tasks
    that end up calling the fs shrinker (fs/super.c:super_cache_scan()).
    This has some disadvantages and for some heavy workloads with memory
    pressure it can cause some delays and stalls that make a machine
    unresponsive for some periods. This happens because:
    
    1) We can have several kswapd tasks on machines with multiple NUMA zones,
       and running the extent map shrinker concurrently can cause high
       contention on some spin locks, namely the spin locks that protect
       the radix tree that tracks roots, the per root xarray that tracks
       open inodes and the list of delayed iputs. This not only delays the
       shrinker but also causes high CPU consumption and makes the task
       running the shrinker monopolize a core, resulting in the symptoms
       of an unresponsive system. This was noted in previous commits such as
       commit ae1e766 ("btrfs: only run the extent map shrinker from
       kswapd tasks");
    
    2) The extent map shrinker's iteration over inodes can often be slow, even
       after changing the data structure that tracks open inodes for a root
       from a red black tree (up to kernel 6.10) to an xarray (kernel 6.10+).
       The transition to the xarray while it made things a bit faster, it's
       still somewhat slow - for example in a test scenario with 10000 inodes
       that have no extent maps loaded, the extent map shrinker took between
       5ms to 8ms, using a release, non-debug kernel. Iterating over the
       extent maps of an inode can also be slow if have an inode with many
       thousands of extent maps, since we use a red black tree to track and
       search extent maps. So having the extent map shrinker run synchronously
       adds extra delay for other things a kswapd task does.
    
    So make the extent map shrinker run asynchronously as a job for the
    system unbounded workqueue, just like what we do for data and metadata
    space reclaim jobs.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    1020443 View commit details
    Browse the repository at this point in the history
  63. btrfs: simplify tracking progress for the extent map shrinker

    Now that the extent map shrinker can only be run by a single task (as a
    work queue item) there is no need to keep the progress of the shrinker
    protected by a spinlock and passing the progress to trace events as
    parameters. So remove the lock and simplify the arguments for the trace
    events.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    70a5f9e View commit details
    Browse the repository at this point in the history
  64. btrfs: rename extent map shrinker members from struct btrfs_fs_info

    The names for the members of struct btrfs_fs_info related to the extent
    map shrinker are a bit too long, so rename them to be shorter by replacing
    the "extent_map_" prefix with the "em_" prefix.
    
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    e7fa845 View commit details
    Browse the repository at this point in the history
  65. btrfs: re-enable the extent map shrinker

    Now that the extent map shrinker can only be run by a single task and runs
    asynchronously as a work queue job, enable it as it can no longer cause
    stalls on tasks allocating memory and entering the extent map shrinker
    through the fs shrinker (implemented by btrfs_free_cached_objects()).
    
    This is crucial to prevent exhaustion of memory due to unbounded extent
    map creation, primarily with direct IO but also for buffered IO on files
    with holes. This problem, for the direct IO case, was first reported in
    the Link tag below. That report was added to a Link tag of the first patch
    that introduced the extent map shrinker, commit 956a17d ("btrfs: add
    a shrinker for extent maps"), however the Link tag disappeared somehow
    from the committed patch (but was included in the submitted patch to the
    mailing list), so adding it below for future reference.
    
    Link: https://lore.kernel.org/linux-btrfs/[email protected]/
    Reviewed-by: Josef Bacik <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    a8371fc View commit details
    Browse the repository at this point in the history
  66. btrfs: remove redundant level argument from read_block_for_search()

    The level parameter passed to read_block_for_search() always matches the
    level of the extent buffer passed in the "eb_ret" parameter, which we are
    also extracting into the "parent_level" local variable.
    
    So remove the level parameter and instead use the "parent_level" variable
    which in fact has a better name (it's the level of the parent node from
    which we are reading a child node/leaf).
    
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2b1ef80 View commit details
    Browse the repository at this point in the history
  67. btrfs: simplify arguments for btrfs_verify_level_key()

    The only caller of btrfs_verify_level_key() is read_block_for_search() and
    it's passing 3 arguments to it that can be extracted from its on stack
    variable of type struct btrfs_tree_parent_check.
    
    So change btrfs_verify_level_key() to accept an argument of type
    struct btrfs_tree_parent_check instead of level, first key and parent
    transid arguments.
    
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    c88ebf1 View commit details
    Browse the repository at this point in the history
  68. btrfs: remove redundant initializations for struct btrfs_tree_parent_…

    …check
    
    It's pointless to initialize the has_first_key field of the stack local
    btrfs_tree_parent_check structure at btrfs_tree_parent_check() and at
    btrfs_qgroup_trace_subtree() since all fields not explicitly initialized
    are zeroed out. In the case of the first function it's a bit odd because
    we are assigning 0 and the field is of type bool, however not incorrect
    since a 0 is converted to false.
    
    Just remove the explicit initializations due to their redundancy.
    
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    b8e63ea View commit details
    Browse the repository at this point in the history
  69. btrfs: remove local generation variable from read_block_for_search()

    It's redundant to have the 'gen' variable since we already have the same
    value in the local btrfs_tree_parent_check structure. So remove it and
    instead use the structure's field.
    
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    4b5c120 View commit details
    Browse the repository at this point in the history
  70. btrfs: do not clear read-only when adding sprout device

    If you follow the seed/sprout wiki, it suggests the following workflow:
    
    btrfstune -S 1 seed_dev
    mount seed_dev mnt
    btrfs device add sprout_dev
    mount -o remount,rw mnt
    
    The first mount mounts the FS readonly, which results in not setting
    BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
    somewhat surprisingly clears the readonly bit on the sb (though the
    mount is still practically readonly, from the users perspective...).
    Finally, the remount checks the readonly bit on the sb against the flag
    and sees no change, so it does not run the code intended to run on
    ro->rw transitions, leaving BTRFS_FS_OPEN unset.
    
    As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
    does no work. This results in leaking deleted snapshots until we run out
    of space.
    
    I propose fixing it at the first departure from what feels reasonable:
    when we clear the readonly bit on the sb during device add.
    
    A new fstest I have written reproduces the bug and confirms the fix.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Boris Burkov <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    boryas authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    70958a9 View commit details
    Browse the repository at this point in the history
  71. btrfs: remove unused btrfs_folio_start_writer_lock()

    This function is not really suitable to lock a folio, as it lacks the
    proper mapping checks, thus the locked folio may not even belong to
    btrfs.
    
    And due to the above reason, the last user inside lock_delalloc_folios()
    is already removed, and we can remove this function.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    8511074 View commit details
    Browse the repository at this point in the history
  72. btrfs: unify to use writer locks for subpage locking

    Since commit d7172f5 ("btrfs: use per-buffer locking for
    extent_buffer reading"), metadata read no longer relies on the subpage
    reader locking.
    
    This means we do not need to maintain a different metadata/data split
    for locking, so we can convert the existing reader lock users by:
    
    - add_ra_bio_pages()
      Convert to btrfs_folio_set_writer_lock()
    
    - end_folio_read()
      Convert to btrfs_folio_end_writer_lock()
    
    - begin_folio_read()
      Convert to btrfs_folio_set_writer_lock()
    
    - folio_range_has_eb()
      Remove the subpage->readers checks, since it is always 0.
    
    - Remove btrfs_subpage_start_reader() and btrfs_subpage_end_reader()
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    336e69f View commit details
    Browse the repository at this point in the history
  73. btrfs: rename btrfs_folio_(set|start|end)_writer_lock()

    Since there is no user of reader locks, rename the writer locks into a
    more generic name, by removing the "_writer" part from the name.
    
    And also rename btrfs_subpage::writer into btrfs_subpage::locked.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    0f71202 View commit details
    Browse the repository at this point in the history
  74. btrfs: use str_yes_no() helper function in btrfs_dump_free_space()

    Remove hard-coded strings by using the str_yes_no() and str_no_yes()
    helper functions.
    
    Signed-off-by: Thorsten Blum <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    toblux authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    4f285a7 View commit details
    Browse the repository at this point in the history
  75. btrfs: fix wrong sizeof in btrfs_do_encoded_write()

    btrfs_do_encoded_write() was converted to use folios in 400b172,
    but we're still allocating based on sizeof(struct page *) rather than
    sizeof(struct folio *). There's no functional change.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Signed-off-by: Mark Harmstone <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    maharmstone authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    b1c5f6e View commit details
    Browse the repository at this point in the history
  76. btrfs: make buffered write to copy one page a time

    Currently the btrfs_buffered_write() is preparing multiple page a time,
    allowing a better performance.
    
    But the current trend is to support larger folio as an optimization,
    instead of implementing own multi-page optimization.
    
    This is inspired by generic_perform_write(), which is copying one folio
    a time.
    
    Such change will prepare us to migrate to implement the write_begin()
    and write_end() callbacks, and make every involved function a little
    easier.
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    c87c299 View commit details
    Browse the repository at this point in the history
  77. btrfs: convert btrfs_buffered_write() to use folios

    The buffered write path is still heavily utilizing the page interface.
    Since we have converted it to do a page-by-page copying, it's much easier
    to convert all involved functions to folio interface, this involves:
    
    - btrfs_copy_from_user()
    - btrfs_drop_folio()
    - prepare_uptodate_page()
    - prepare_one_page()
    - lock_and_cleanup_extent_if_need()
    - btrfs_dirty_page()
    
    All function are changed to accept a folio parameter, and if the word
    "page" is in the function name, change that to "folio" too.
    
    The function btrfs_dirty_page() is exported for v1 space cache, convert
    v1 cache call site to convert its page to folio for the new interface.
    
    And there is a small enhancement for prepare_one_folio(), instead of
    manually waiting for the page writeback, let __filemap_get_folio() to
    handle that by using FGP_WRITEBEGIN, which implies
    (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE).
    
    Signed-off-by: Qu Wenruo <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    adam900710 authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    e820dbe View commit details
    Browse the repository at this point in the history
  78. btrfs: use filemap_get_folio() helper

    When fgp_flags and gfp_flags are zero, use filemap_get_folio(A, B)
    instead of __filemap_get_folio(A, B, 0, 0)—no need for the extra
    arguments 0, 0.
    
    Signed-off-by: Anand Jain <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    asj authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    d07eaa9 View commit details
    Browse the repository at this point in the history
  79. btrfs: implement partial deletion of RAID stripe extents

    In our CI system, the RAID stripe tree configuration sometimes fails with
    the following ASSERT():
    
      assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64
    
    This ASSERT()ion triggers, because for the initial design of RAID
    stripe-tree, I had the "one ordered-extent equals one bio" rule of zoned
    btrfs in mind.
    
    But for a RAID stripe-tree based system, that is not hosted on a zoned
    storage device, but on a regular device this rule doesn't apply.
    
    So in case the range we want to delete starts in the middle of the
    previous item, grab the item and "truncate" it's length. That is, clone
    the item, subtract the deleted portion from the key's offset, delete the
    old item and insert the new one.
    
    In case the range to delete ends in the middle of an item, we have to
    adjust both the item's key as well as the stripe extents and then
    re-insert the modified clone into the tree after deleting the old stripe
    extent.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    6aea95e View commit details
    Browse the repository at this point in the history
  80. btrfs: tests: implement case for partial RAID stripe-tree delete

    Implement self-tests for partial deletion of RAID stripe-tree entries.
    
    These two new tests cover both the deletion of the front of a RAID
    stripe-tree stripe extent as well as truncation of an item to make it
    smaller.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    morbidrsa authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    6e6ecde View commit details
    Browse the repository at this point in the history
  81. btrfs: reduce extent tree lock contention when searching for inline b…

    …ackref
    
    When inserting extent backref, in order to check whether refs other than
    inline refs are used, we always use path keep locks for tree search, which
    will increase the lock contention of extent tree.
    
    We do not need the parent node every time to determine whether normal
    refs are used.  It is only needed when the extent item is the last item
    in a leaf.
    
    Therefore, we change it to first use keep_locks=0 for search.  If the
    extent item happens to be the last item in the leaf, we then change to
    keep_locks=1 for the second search to reduce lock contention.
    
    Reviewed-by: Filipe Manana <[email protected]>
    Signed-off-by: Robbie Ko <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Robbie Ko authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    1d16c27 View commit details
    Browse the repository at this point in the history
  82. btrfs: remove BUG_ON() at btrfs_destroy_delayed_refs()

    At btrfs_destroy_delayed_refs() it's unexpected to not find the block
    group to which a delayed reference's extent belongs to, so we have this
    BUG_ON(), not just because it's highly unexpected but also because we
    don't know what to do there.
    
    Since we are in the transaction abort path, there's nothing we can do
    other than proceed and cleanup all used resources we can. So remove
    the BUG_ON() and deal with a missing block group by logging an error
    message and continuing to cleanup all we can related to the current
    delayed ref head and moving to other delayed refs.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    00f5296 View commit details
    Browse the repository at this point in the history
  83. btrfs: move btrfs_destroy_delayed_refs() to delayed-ref.c

    It's better suited at delayed-ref.c since it's about delayed refs and
    contains logic to iterate over them (using the red black tree, doing all
    the locking, freeing, etc), so move it from disk-io.c, which is pretty
    big, into delayed-ref.c, hiding implementation details of how delayed
    refs are tracked and managed. This also facilitates the next patches in
    the series.
    
    This change moves the code between files but also does the following
    simple cleanups:
    
    1) Rename the 'cache' variable to 'bg', since it's a block group
       (the 'cache' logic comes from old days where the block group
       structure was named 'btrfs_block_group_cache');
    
    2) Move the 'ref' variable declaration to the scope of the inner
       while loop, since it's not used outside that loop.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    22a0ae1 View commit details
    Browse the repository at this point in the history
  84. btrfs: remove fs_info parameter from btrfs_destroy_delayed_refs()

    The fs_info parameter is redundant because it can be extracted from the
    transaction given as another parameter. So remove it and use the fs_info
    accessible from the transaction.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2f6e05a View commit details
    Browse the repository at this point in the history
  85. btrfs: remove fs_info parameter from btrfs_cleanup_one_transaction()

    The fs_info parameter is redundant because it can be extracted from the
    transaction given as another parameter. So remove it and use the fs_info
    accessible from the transaction.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    c3a5888 View commit details
    Browse the repository at this point in the history
  86. btrfs: remove duplicated code to drop delayed ref during transaction …

    …abort
    
    When destroying delayed refs during a transaction abort, we have open
    coded the removal of a delayed ref, which is also done by the static
    helper function drop_delayed_ref(). So remove that duplicated code and
    use drop_delayed_ref() instead.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    8d07a8f View commit details
    Browse the repository at this point in the history
  87. btrfs: use helper to find first ref head at btrfs_destroy_delayed_refs()

    Instead of open coding it, use the find_first_ref_head() helper at
    btrfs_destroy_delayed_refs(). This avoids duplicating the logic,
    specially with the upcoming changes in subsequent patches.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    055903c View commit details
    Browse the repository at this point in the history
  88. btrfs: remove num_entries atomic counter from delayed ref root

    The atomic counter 'num_entries' is not used anymore, we increment it
    and decrement it but then we don't ever read it to use for any logic.
    Its last use was removed with commit 61a56a9 ("btrfs: delayed refs
    pre-flushing should only run the heads we have"). So remove it.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    f7d4b49 View commit details
    Browse the repository at this point in the history
  89. btrfs: change return type of btrfs_delayed_ref_lock() to boolean

    The function only returns 0, meaning it was able to lock the delayed ref
    head, or -EAGAIN in case it wasn't able to lock it. So simplify this and
    use a boolean return type instead, returning true if it was able to lock
    and false otherwise.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    7ef3604 View commit details
    Browse the repository at this point in the history
  90. btrfs: simplify obtaining a delayed ref head

    Instead of doing it in two steps outside of delayed-ref.c, leaking low
    level details such as locking, move the logic entirely to delayed-ref.c
    under btrfs_select_ref_head(), reducing code and making things simpler
    for the caller.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    a98048e View commit details
    Browse the repository at this point in the history
  91. btrfs: move delayed ref head unselection to delayed-ref.c

    The unselect_delayed_ref_head() at extent-tree.c doesn't really belong in
    that file as it's a delayed refs specific detail and therefore should be
    at delayed-ref.c. Further its inverse, btrfs_select_ref_head(), is at
    delayed-ref.c, so it only makes sense to have it there too.
    
    So move unselect_delayed_ref_head() into delayed-ref.c and rename it to
    btrfs_unselect_ref_head() so that its name closely matches its inverse
    (btrfs_select_ref_head()).
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    58a4391 View commit details
    Browse the repository at this point in the history
  92. btrfs: pass fs_info to functions that search for delayed ref heads

    One of the following patches in the series will need to access fs_info in
    the function find_ref_head(), so pass a fs_info argument to it as well as
    to the functions btrfs_select_ref_head() and btrfs_find_delayed_ref_head()
    which call find_ref_head().
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    765f828 View commit details
    Browse the repository at this point in the history
  93. btrfs: pass fs_info to btrfs_delete_ref_head()

    One of the following patches in the series will need to access fs_info at
    btrfs_delete_ref_head(), so pass a fs_info argument to it.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    5f54384 View commit details
    Browse the repository at this point in the history
  94. btrfs: assert delayed refs lock is held at find_ref_head()

    We have 3 callers for find_ref_head() so assert at find_ref_head() that we
    have the delayed refs lock held, removing the assertion from one of its
    callers (btrfs_find_delayed_ref_head()).
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    7226ed7 View commit details
    Browse the repository at this point in the history
  95. btrfs: assert delayed refs lock is held at find_first_ref_head()

    The delayed refs lock must be held when calling find_first_ref_head(), so
    assert that it's being held.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    64a71f0 View commit details
    Browse the repository at this point in the history
  96. btrfs: assert delayed refs lock is held at add_delayed_ref_head()

    The delayed refs lock must be held when calling add_delayed_ref_head(),
    so assert that it's being held.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    a8985ac View commit details
    Browse the repository at this point in the history
  97. btrfs: add comments regarding locking to struct btrfs_delayed_ref_root

    Add some comments to struct btrfs_delayed_ref_root's fields to mention
    what its spinlock protects.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    d3aaeea View commit details
    Browse the repository at this point in the history
  98. btrfs: track delayed ref heads in an xarray

    Currently we use a red black tree (rb-tree) to track the delayed ref
    heads (in struct btrfs_delayed_ref_root::href_root). This however is not
    very efficient when the number of delayed ref heads is large (and it's
    very common to be at least in the order of thousands) since rb-trees are
    binary trees. For example for 10K delayed ref heads, the tree has a depth
    of 13. Besides that, inserting into the tree requires navigating through
    it and pulling useless cache lines in the process since the red black tree
    nodes are embedded within the delayed ref head structure - on the other
    hand, by being embedded, it requires no extra memory allocations.
    
    We can improve this by using an xarray instead which has a much higher
    branching factor than a red black tree (binary balanced tree) and is more
    cache friendly and behaves like a resizable array, with a much better
    search and insertion complexity than a red black tree. This only has one
    small disadvantage which is that insertion will sometimes require
    allocating memory for the xarray - which may fail (not that often since
    it uses a kmem_cache) - but on the other hand we can reduce the delayed
    ref head structure size by 24 bytes (from 152 down to 128 bytes) after
    removing the embedded red black tree node, meaning than we can now fit
    32 delayed ref heads per 4K page instead of 26, and that gain compensates
    for the occasional memory allocations needed for the xarray nodes. We
    also end up using only 2 cache lines instead of 3 per delayed ref head.
    
    Running the following fs_mark test showed some improvements:
    
        $ cat test.sh
        #!/bin/bash
    
        DEV=/dev/nullb0
        MNT=/mnt/nullb0
        MOUNT_OPTIONS="-o ssd"
        FILES=100000
        THREADS=$(nproc --all)
    
        echo "performance" | \
            tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
    
        mkfs.btrfs -f $DEV
        mount $MOUNT_OPTIONS $DEV $MNT
    
        OPTS="-S 0 -L 5 -n $FILES -s 0 -t $THREADS -k"
        for ((i = 1; i <= $THREADS; i++)); do
            OPTS="$OPTS -d $MNT/d$i"
        done
    
        fs_mark $OPTS
    
        umount $MNT
    
    Before this patch:
    
       FSUse%        Count         Size    Files/sec     App Overhead
           10      1200000            0     171845.7         12253839
           16      2400000            0     230898.7         12308254
           23      3600000            0     212292.9         12467768
           30      4800000            0     195737.8         12627554
           46      6000000            0     171055.2         12783329
    
    After this patch:
    
       FSUse%        Count         Size    Files/sec     App Overhead
           10      1200000            0     173835.0         12246131
           16      2400000            0     233537.8         12271746
           23      3600000            0     220398.7         12307737
           30      4800000            0     204483.6         12392318
           40      6000000            0     182923.3         12771843
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    928ed13 View commit details
    Browse the repository at this point in the history
  99. btrfs: remove no longer used delayed ref head search functionality

    After the previous patch, which converted the rb-tree used to track
    delayed ref heads into an xarray, the find_ref_head() function is now
    used only by one caller which always passes false to the 'return_bigger'
    argument. So remove the 'return_bigger' logic, simplifying the function,
    and move all the function code to the single caller.
    
    Reviewed-by: Boris Burkov <[email protected]>
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    7f13360 View commit details
    Browse the repository at this point in the history
  100. btrfs: remove pointless iocb::ki_pos addition in btrfs_encoded_read()

    iocb->ki_pos isn't used after this function, so there's no point in
    changing its value.
    
    Signed-off-by: Mark Harmstone <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    maharmstone authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    4bca741 View commit details
    Browse the repository at this point in the history
  101. btrfs: change btrfs_encoded_read() so that reading of extent is done …

    …by caller
    
    Change the behaviour of btrfs_encoded_read() so that if it needs to read
    an extent from disk, it leaves the extent and inode locked and returns
    -EIOCBQUEUED. The caller is then responsible for doing the I/O via
    btrfs_encoded_read_regular() and unlocking the extent and inode.
    
    Signed-off-by: Mark Harmstone <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    maharmstone authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    26efd44 View commit details
    Browse the repository at this point in the history
  102. btrfs: don't sleep in btrfs_encoded_read() if IOCB_NOWAIT is set

    Change btrfs_encoded_read() so that it returns -EAGAIN rather than sleeps
    if IOCB_NOWAIT is set in iocb->ki_flags. The conditions that require
    sleeping are: inode lock, writeback, extent lock, ordered range.
    
    Signed-off-by: Mark Harmstone <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    maharmstone authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    973a432 View commit details
    Browse the repository at this point in the history
  103. btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages()

    Change btrfs_encoded_read_regular_fill_pages() so that the priv struct
    is allocated rather than stored on the stack, in preparation for adding
    an asynchronous mode to the function.
    
    Signed-off-by: Mark Harmstone <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    maharmstone authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    68d3b27 View commit details
    Browse the repository at this point in the history
  104. btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl)

    Add an io_uring command for encoded reads, using the same interface as
    the existing BTRFS_IOC_ENCODED_READ ioctl.
    
    btrfs_uring_encoded_read() is an io_uring version of
    btrfs_ioctl_encoded_read(), which validates the user input and calls
    btrfs_encoded_read() to read the appropriate metadata. If we determine
    that we need to read an extent from disk, we call
    btrfs_encoded_read_regular_fill_pages() through
    btrfs_uring_read_extent() to prepare the bio.
    
    The existing btrfs_encoded_read_regular_fill_pages() is changed so that
    if it is passed a valid uring_ctx, rather than waking up any waiting
    threads it calls btrfs_uring_read_extent_endio(). This in turn copies
    the read data back to userspace, and calls io_uring_cmd_done() to
    complete the io_uring command.
    
    Because we're potentially doing a non-blocking read,
    btrfs_uring_read_extent() doesn't clean up after itself if it returns
    -EIOCBQUEUED. Instead, it allocates a priv struct, populates the fields
    there that we will need to unlock the inode and free our allocations,
    and defers this to the btrfs_uring_read_finished() that gets called when
    the bio completes.
    
    Signed-off-by: Mark Harmstone <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    maharmstone authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    34310c4 View commit details
    Browse the repository at this point in the history
  105. btrfs: add struct io_btrfs_cmd as type for io_uring_cmd_to_pdu()

    Add struct io_btrfs_cmd as a wrapper type for io_uring_cmd_to_pdu(),
    rather than using a raw pointer.
    
    Suggested-by: Pavel Begunkov <[email protected]>
    Signed-off-by: Mark Harmstone <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    maharmstone authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    1cc86ae View commit details
    Browse the repository at this point in the history
  106. io_uring/cmd: let cmds to know about dying task

    When the taks that submitted a request is dying, a task work for that
    request might get run by a kernel thread or even worse by a half
    dismantled task. We can't just cancel the task work without running the
    callback as the cmd might need to do some clean up, so pass a flag
    instead. If set, it's not safe to access any task resources and the
    callback is expected to cancel the cmd ASAP.
    
    Reviewed-by: Jens Axboe <[email protected]>
    Reviewed-by: Ming Lei <[email protected]>
    Signed-off-by: Pavel Begunkov <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    isilence authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    df3b8ca View commit details
    Browse the repository at this point in the history
  107. btrfs: push cleanup into btrfs_read_locked_inode()

    Move btrfs_add_inode_to_root() so it can be called from
    btrfs_read_locked_inode(), no changes were made to the function.
    
    Move cleanup code from btrfs_iget_path() to btrfs_read_locked_inode.
    This improves readability and improves a leaky abstraction. Previously
    btrfs_iget_path() had to handle a positive error case as a result of a
    call to btrfs_search_slot(), but it makes more sense to handle this
    closer to the source of the call.
    
    Signed-off-by: Leo Martins <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    loemraw authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    6967399 View commit details
    Browse the repository at this point in the history
  108. btrfs: remove conditional path allocation in btrfs_read_locked_inode()

    Remove conditional path allocation from btrfs_read_locked_inode(). Add
    an ASSERT(path) to indicate it should never be called with a NULL path.
    
    Call btrfs_read_locked_inode() directly from btrfs_iget(). This causes
    code duplication between btrfs_iget() and btrfs_iget_path(), but I
    think this is justifiable as it removes the need for conditionally
    allocating the path inside of btrfs_read_locked_inode(). This makes the
    code easier to reason about and makes it clear who has the
    responsibility of allocating and freeing the path.
    
    Signed-off-by: Leo Martins <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    loemraw authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    7c855e1 View commit details
    Browse the repository at this point in the history
  109. btrfs: simplify range tracking in cow_file_range()

    Simplify tracking of the range processed by using cur_alloc_size only to
    store the reserved part that may fail to the allocated extent. Remove
    the ram_size as well since it is always equal to cur_alloc_size in the
    context. Advance the start in normal path until extent allocation
    succeeds and keep the start unchanged in the error handling path.
    
    Passed the fstest generic/475 test for a hundred times with quota
    enabled. And a modified generic/475 test by removing the sleep time
    for a hundred times. About one tenth of the tests do enter the error
    handling path due to fail to reserve extent.
    
    Suggested-by: Qu Wenruo <[email protected]>
    Signed-off-by: Haisu Wang <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Haisu Wang authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    5599f39 View commit details
    Browse the repository at this point in the history
  110. btrfs: add new ioctl to wait for cleaned subvolumes

    Add a new unprivileged ioctl that will let the command
    'btrfs subvolume sync' work without the (privileged) SEARCH_TREE ioctl.
    
    There are several modes of operation, where the most common ones are to
    wait on a specific subvolume or all currently queued for cleaning. This
    is utilized e.g. in backup applications that delete subvolumes and wait
    until they're cleaned to check for remaining space.
    
    The other modes are for flexibility, e.g. for monitoring or
    checkpoints in the queue of deleted subvolumes, again without the need
    to use SEARCH_TREE.
    
    Notes:
    
    - waiting is interruptible, the timeout is set to 1 second and is not
      configurable
    
    - repeated calls to the ioctl see a different state, so this is
      inherently racy when using e.g. the count or peek next/last
    
    Use cases:
    
    - a subvolume A was deleted, wait for cleaning (WAIT_FOR_ONE)
    
    - a bunch of subvolumes were deleted, wait for all (WAIT_FOR_QUEUED or
      PEEK_LAST + WAIT_FOR_ONE)
    
    - count how many are queued (not blocking), for monitoring purposes
    
    - report progress (PEEK_NEXT), may miss some if cleaning is quick
    
    - own waiting in user space (PEEK_LAST until it's 0)
    
    Signed-off-by: David Sterba <[email protected]>
    kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    6c83d15 View commit details
    Browse the repository at this point in the history
  111. btrfs: update stale comment for struct btrfs_delayed_ref_node::add_list

    The comment refers to a list in the respective delayed ref head that no
    longer exists (ref_list), it was replaced with a rbtree (ref_tree) in
    commit 0e0adbc ("btrfs: track refs in a rb_tree instead of a list").
    
    So update the stale comment to refer to the rbtree instead of the old
    list.
    
    Reviewed-by: Johannes Thumshirn <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    dd0896e View commit details
    Browse the repository at this point in the history
  112. btrfs: remove hole from struct btrfs_delayed_node

    On x86_64 and a release kernel, there's a 4 bytes hole in the structure
    after the ref count field:
    
      struct btrfs_delayed_node {
              u64                        inode_id;             /*     0     8 */
              u64                        bytes_reserved;       /*     8     8 */
              struct btrfs_root *        root;                 /*    16     8 */
              struct list_head           n_list;               /*    24    16 */
              struct list_head           p_list;               /*    40    16 */
              struct rb_root_cached      ins_root;             /*    56    16 */
              /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
              struct rb_root_cached      del_root;             /*    72    16 */
              struct mutex               mutex;                /*    88    32 */
              struct btrfs_inode_item    inode_item;           /*   120   160 */
              /* --- cacheline 4 boundary (256 bytes) was 24 bytes ago --- */
              refcount_t                 refs;                 /*   280     4 */
    
              /* XXX 4 bytes hole, try to pack */
    
              u64                        index_cnt;            /*   288     8 */
              long unsigned int          flags;                /*   296     8 */
              int                        count;                /*   304     4 */
              u32                        curr_index_batch_size; /*   308     4 */
              u32                        index_item_leaves;    /*   312     4 */
    
              /* size: 320, cachelines: 5, members: 15 */
              /* sum members: 312, holes: 1, sum holes: 4 */
              /* padding: 4 */
      };
    
    Move the 'count' field, which is 4 bytes long, to just below the ref count
    field, so we eliminate the hole and reduce the structure size from 320
    bytes down to 312 bytes:
    
      struct btrfs_delayed_node {
              u64                        inode_id;             /*     0     8 */
              u64                        bytes_reserved;       /*     8     8 */
              struct btrfs_root *        root;                 /*    16     8 */
              struct list_head           n_list;               /*    24    16 */
              struct list_head           p_list;               /*    40    16 */
              struct rb_root_cached      ins_root;             /*    56    16 */
              /* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
              struct rb_root_cached      del_root;             /*    72    16 */
              struct mutex               mutex;                /*    88    32 */
              struct btrfs_inode_item    inode_item;           /*   120   160 */
              /* --- cacheline 4 boundary (256 bytes) was 24 bytes ago --- */
              refcount_t                 refs;                 /*   280     4 */
              int                        count;                /*   284     4 */
              u64                        index_cnt;            /*   288     8 */
              long unsigned int          flags;                /*   296     8 */
              u32                        curr_index_batch_size; /*   304     4 */
              u32                        index_item_leaves;    /*   308     4 */
    
              /* size: 312, cachelines: 5, members: 15 */
              /* last cacheline: 56 bytes */
      };
    
    This now allows to have 13 delayed nodes per 4K page instead of 12.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    a20725e View commit details
    Browse the repository at this point in the history
  113. btrfs: simplify logic to decrement snapshot counter at btrfs_mksnapsh…

    …ot()
    
    There's no point in having a 'snapshot_force_cow' variable to track if we
    need to decrement the root->snapshot_force_cow counter, as we never jump
    to the 'out' label after incrementing the counter. Simplify this by
    removing the variable and always decrementing the counter before the 'out'
    label, right after the call to btrfs_mksubvol().
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    e36d114 View commit details
    Browse the repository at this point in the history
  114. btrfs: avoid superfluous calls to free_extent_map() in btrfs_encoded_…

    …read()
    
    Change the control flow of btrfs_encoded_read() so that it doesn't call
    free_extent_map() when we know that this has already been done.
    
    Reviewed-by: Anand Jain <[email protected]>
    Signed-off-by: Mark Harmstone <[email protected]>
    Suggested-by: Anand Jain <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    maharmstone authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    08fdca9 View commit details
    Browse the repository at this point in the history
  115. btrfs: fix a typo in btrfs_use_zone_append

    REQ_OP_ZONE_APPNED -> REQ_OP_ZONE_APPEND.
    
    Signed-off-by: Christoph Hellwig <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    Christoph Hellwig authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    80b3695 View commit details
    Browse the repository at this point in the history
  116. btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_…

    …ioctl()
    
    Smatch complains about calling PTR_ERR() against a NULL pointer:
    
      fs/btrfs/super.c:2272 btrfs_control_ioctl() warn: passing zero to 'PTR_ERR'
    
    Fix this by calling PTR_ERR() against the device pointer only if it
    contains an error.
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    2342d65 View commit details
    Browse the repository at this point in the history
  117. btrfs: remove check for NULL fs_info at btrfs_folio_end_lock_bitmap()

    Smatch complains about possibly dereferencing a NULL fs_info at
    btrfs_folio_end_lock_bitmap():
    
      fs/btrfs/subpage.c:332 btrfs_folio_end_lock_bitmap() warn: variable dereferenced before check 'fs_info' (see line 326)
    
    because we access fs_info to set the 'start_bit' variable before doing the
    check for a NULL fs_info.
    
    However fs_info is never NULL, since in the only caller of
    btrfs_folio_end_lock_bitmap() is extent_writepage(), where we have an
    inode which always as a non-NULL fs_info.
    
    So remove the check for a NULL fs_info at btrfs_folio_end_lock_bitmap().
    
    Reviewed-by: Qu Wenruo <[email protected]>
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    722d343 View commit details
    Browse the repository at this point in the history
  118. btrfs: send: check for dead send root under critical section

    We're checking if the send root is dead without the protection of the
    root's root_item_lock spinlock, which is what protects the root's flags.
    The inverse, setting the dead flag on a root, is done under the protection
    of that lock, at btrfs_delete_subvolume(). Also checking and updating the
    root's send_in_progress counter is supposed to be done in the same
    critical section as checking for or setting the root dead flag, so that
    these operations are done atomically as a single step (which is correctly
    done by btrfs_delete_subvolume()).
    
    So fix this by checking if the send root is dead in the same critical
    section that updates the send_in_progress counter, which is protected by
    the root's root_item_lock spinlock.
    
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    dc058f5 View commit details
    Browse the repository at this point in the history
  119. btrfs: send: check for read-only send root under critical section

    We're checking if the send root is read-only without being under the
    protection of the root's root_item_lock spinlock, which is what protects
    the root's flags when clearing the read-only flag, done at
    btrfs_ioctl_subvol_setflags(). Furthermore, it should be done in the
    same critical section that increments the root's send_in_progress counter,
    as btrfs_ioctl_subvol_setflags() clears the read-only flag in the same
    critical section that checks the counter's value.
    
    So fix this by moving the read-only check under the critical section
    delimited by the root's root_item_lock which also increments the root's
    send_in_progress counter.
    
    Signed-off-by: Filipe Manana <[email protected]>
    Reviewed-by: David Sterba <[email protected]>
    Signed-off-by: David Sterba <[email protected]>
    fdmanana authored and kdave committed Nov 11, 2024
    Configuration menu
    Copy the full SHA
    e82c936 View commit details
    Browse the repository at this point in the history