From 531af601a15d69b6c09af9c288ca214d3b68ee6f Mon Sep 17 00:00:00 2001 From: Rob Norris Date: Thu, 30 Nov 2023 22:00:03 +1100 Subject: [PATCH] dnode_is_dirty: use dn_dirty_txg to check dirtiness dn_dirty_ctx is always set to the highest txg that has ever dirtied the dnode. It is set in dbuf_dirty() when a data or metadnode dbuf is dirtied, and never cleared. [analysis of bug #15526 and fix #15571 below, for future readers] The previous dirty check was: for (int i = 0; i < TXG_SIZE; i++) { if (multilist_link_active(&dn->dn_dirty_link[i]) [dnode is dirty] However, this check is not "is the dnode dirty?" but rather, "is the dnode on a list?". There is a gap in dmu_objset_sync_dnodes() where the dnode is moved from os_dirty_dnodes to os_synced_dnodes, before dnode_sync() is called to write out the dirty dbufs. So, there is a moment when the dnode is not on a list, and so the check fails. It doesn't matter that the dirty check takes dn_mtx, because that lock isn't used for dn_dirty_link. The os_dirty_dnodes sublist lock is held in dmu_objset_sync_dnodes(), but trying to take that would mean possibly waiting until everything on that sublist has been synced. The correct fix has to check something that positively asserts the dnode is dirty, rather than an implementation detail. dn_dirty_txg (via DNODE_IS_DIRTY()) is that - its a normal bit of dnode state, under the dn_mtx lock, and unambiguously indicates whether or not there's changes pending. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris --- module/zfs/dnode.c | 22 +++------------------- 1 file changed, 3 insertions(+), 19 deletions(-) diff --git a/module/zfs/dnode.c b/module/zfs/dnode.c index 7ae74ad1318d..9da35742b446 100644 --- a/module/zfs/dnode.c +++ b/module/zfs/dnode.c @@ -1778,31 +1778,15 @@ dnode_try_claim(objset_t *os, uint64_t object, int slots) } /* - * Checks if the dnode itself is dirty, or is carrying any uncommitted records. - * It is important to check both conditions, as some operations (eg appending - * to a file) can dirty both as a single logical unit, but they are not synced - * out atomically, so checking one and not the other can result in an object - * appearing to be clean mid-way through a commit. - * - * Do not change this lightly! If you get it wrong, dmu_offset_next() can - * detect a hole where there is really data, leading to silent corruption. + * Check if the dnode (including its data) is dirty on this or any future txg. */ boolean_t dnode_is_dirty(dnode_t *dn) { mutex_enter(&dn->dn_mtx); - - for (int i = 0; i < TXG_SIZE; i++) { - if (multilist_link_active(&dn->dn_dirty_link[i]) || - !list_is_empty(&dn->dn_dirty_records[i])) { - mutex_exit(&dn->dn_mtx); - return (B_TRUE); - } - } - + boolean_t dirty = DNODE_IS_DIRTY(dn); mutex_exit(&dn->dn_mtx); - - return (B_FALSE); + return (dirty); } void