Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux: add zpl_drop_inode #16612

Closed
wants to merge 2 commits into from
Closed

Conversation

snajpa
Copy link
Contributor

@snajpa snajpa commented Oct 6, 2024

Motivation and Context

This is an attempt to rectify the situation in zfs_zget() on Linux, where we might be racing with inode reclaim from the kernel.

The code is ugly and I'm not sure it's perfectly OK to do it this way anyway.

Now I'm moderately sure the refcount check is right, but the more eyes it gets the better.

Description

There's now VERIFY-wrapped mandatory igrab() in zget(), because inode lifetime should be guaranteed by the new zpl_drop_inode super_op. This should simplify reasoning about znode/inode lifetimes as both alloc and drop are defined by zpl_super_operations now.

How Has This Been Tested?

Bots + locally (+ hammered by template builds)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@snajpa snajpa force-pushed the zpl-drop-inode branch 2 times, most recently from f6c80c6 to 08a1cc1 Compare October 7, 2024 13:19
@behlendorf behlendorf added the Status: Work in Progress Not yet ready for general review label Oct 7, 2024
@snajpa snajpa force-pushed the zpl-drop-inode branch 5 times, most recently from a87e1db to a747e6d Compare October 8, 2024 16:45
snajpa added a commit to vpsfreecz/zfs that referenced this pull request Oct 8, 2024
Other than openzfs#16612, this is the only place I see where trouble
similar to openzfs#16608 could arise.

I'm not sure all the supported kernel versions will be okay with
these modifications, I'm posting this as a draft to let the bots
chew on it to find out.

Signed-off-by: Pavel Snajdr <[email protected]>
snajpa added a commit to vpsfreecz/zfs that referenced this pull request Oct 8, 2024
Other than openzfs#16612, this is the only place I see where trouble
similar to openzfs#16608 could arise.

I'm not sure all the supported kernel versions will be okay with
these modifications, I'm posting this as a draft to let the bots
chew on it to find out.

Signed-off-by: Pavel Snajdr <[email protected]>
@snajpa snajpa marked this pull request as ready for review October 9, 2024 01:54
@snajpa
Copy link
Contributor Author

snajpa commented Oct 10, 2024

Might be a good idea to wrap the sa buffer hold check in znode hold enter...

@snajpa
Copy link
Contributor Author

snajpa commented Oct 10, 2024

Or not... BUG: scheduling while atomic: z_zrele/1836/0x00000002

Might be worth noting here: zpl_drop_inode is called in atomic context while a spinlock is held... which means we can't do any mutex locking in this callback.

@snajpa snajpa marked this pull request as draft October 11, 2024 12:40
@snajpa snajpa force-pushed the zpl-drop-inode branch 4 times, most recently from 0f5f30f to c898606 Compare October 21, 2024 09:50
This is an attempt to rectify the situation in zfs_zget() on Linux,
where we might have been racing with inode reclaim from the kernel.

New zpl_drop_inode function should ensure this scenario never occurs.

Signed-off-by: Pavel Snajdr <[email protected]>
This should avoid this VERIFY trip:

VERIFY3(tx->tx_txg <= spa_final_dirty_txg(os->os_spa)) failed (63 <= 62
PANIC at dbuf.c:2324:dbuf_dirty()
Showing stack for process 587858
CPU: 1 PID: 587858 Comm: txg_sync Tainted: P           OE      6.10.12-
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-
Call Trace:
 <TASK>
 dump_stack_lvl+0x64/0x80
 spl_panic+0x100/0x120 [spl]
 dbuf_dirty+0xd5a/0x1300 [zfs]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? dmu_buf_will_dirty_impl+0xdf/0x330 [zfs]
 spa_history_log_sync+0x11e/0x620 [zfs]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? list_head+0x9/0x30 [zfs]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? __cv_destroy+0x92/0x250 [spl]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? mutex_lock+0x12/0x30
 dsl_sync_task_sync+0xb9/0x120 [zfs]
 dsl_pool_sync+0x4c7/0x6a0 [zfs]
 spa_sync_iterate_to_convergence+0xd8/0x320 [zfs]
 spa_sync+0x308/0x8e0 [zfs]
 ? __wake_up+0x44/0x60
 txg_sync_thread+0x272/0x3c0 [zfs]
 ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
 ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
 thread_generic_wrapper+0x66/0x90 [spl]
 kthread+0xd2/0x100
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x34/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>

Signed-off-by: Pavel Snajdr <[email protected]>
@snajpa
Copy link
Contributor Author

snajpa commented Oct 22, 2024

So the main problem with this approach is this: inode might get marked for eviction with inode->i_state |= I_FREEING in ways that go around the ->drop_inode() callback completely, for example (but also lot other places than): prune_icache_sb()->list_lru_shrink_walk()->inode_lru_isolate(). But also when the filesystem is unmounting and freeing the inodes in the final round of evictions in generic_shutdown_super.

These I_FREEING incidents will make igrab fail, so it needs to keep retrying. So overall I think if I'd change anything here, it would be the comment in zfs_zget() to save others this roundtrip thinking it could be done :)

@behlendorf what do you think, do I change this PR to rectify the comment, or am I missing something how it could be done after all? There's also the pitfall of not being able to mutex_enter in the zpl_drop_inode cb, because that is called under the inode spinlock.

@snajpa
Copy link
Contributor Author

snajpa commented Oct 29, 2024

I took a different approach, let's see where that takes me.

@snajpa snajpa closed this Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Work in Progress Not yet ready for general review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants