Skip to content

Commit

Permalink
dnode_is_dirty: use dn_dirty_txg to check dirtiness
Browse files Browse the repository at this point in the history
dn_dirty_ctx is always set to the highest txg that has ever dirtied the
dnode. It is set in dbuf_dirty() when a data or metadnode dbuf is
dirtied, and never cleared.

[analysis of bug openzfs#15526 and fix openzfs#15571 below, for future readers]

The previous dirty check was:

    for (int i = 0; i < TXG_SIZE; i++) {
        if (multilist_link_active(&dn->dn_dirty_link[i])
            [dnode is dirty]

However, this check is not "is the dnode dirty?" but rather, "is the
dnode on a list?".

There is a gap in dmu_objset_sync_dnodes() where the dnode is moved from
os_dirty_dnodes to os_synced_dnodes, before dnode_sync() is called to
write out the dirty dbufs. So, there is a moment when the dnode is not
on a list, and so the check fails.

It doesn't matter that the dirty check takes dn_mtx, because that lock
isn't used for dn_dirty_link. The os_dirty_dnodes sublist lock is held
in dmu_objset_sync_dnodes(), but trying to take that would mean possibly
waiting until everything on that sublist has been synced.

The correct fix has to check something that positively asserts the dnode
is dirty, rather than an implementation detail. dn_dirty_txg (via
DNODE_IS_DIRTY()) is that - its a normal bit of dnode state, under the
dn_mtx lock, and unambiguously indicates whether or not there's changes
pending.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <[email protected]>
  • Loading branch information
robn committed Nov 30, 2023
1 parent a03ebd9 commit 531af60
Showing 1 changed file with 3 additions and 19 deletions.
22 changes: 3 additions & 19 deletions module/zfs/dnode.c
Original file line number Diff line number Diff line change
Expand Up @@ -1778,31 +1778,15 @@ dnode_try_claim(objset_t *os, uint64_t object, int slots)
}

/*
* Checks if the dnode itself is dirty, or is carrying any uncommitted records.
* It is important to check both conditions, as some operations (eg appending
* to a file) can dirty both as a single logical unit, but they are not synced
* out atomically, so checking one and not the other can result in an object
* appearing to be clean mid-way through a commit.
*
* Do not change this lightly! If you get it wrong, dmu_offset_next() can
* detect a hole where there is really data, leading to silent corruption.
* Check if the dnode (including its data) is dirty on this or any future txg.
*/
boolean_t
dnode_is_dirty(dnode_t *dn)
{
mutex_enter(&dn->dn_mtx);

for (int i = 0; i < TXG_SIZE; i++) {
if (multilist_link_active(&dn->dn_dirty_link[i]) ||
!list_is_empty(&dn->dn_dirty_records[i])) {
mutex_exit(&dn->dn_mtx);
return (B_TRUE);
}
}

boolean_t dirty = DNODE_IS_DIRTY(dn);
mutex_exit(&dn->dn_mtx);

return (B_FALSE);
return (dirty);
}

void
Expand Down

0 comments on commit 531af60

Please sign in to comment.