Skip to content

Commit

Permalink
btrfs: don't hold dev_replace rwsem over whole of btrfs_map_block
Browse files Browse the repository at this point in the history
Don't hold the dev_replace rwsem for the entirety of btrfs_map_block().

It is only needed to protect:

a) calls to find_live_mirror() and
b) calling into handle_ops_on_dev_replace().

But there is no need to hold the rwsem for any kind of set_io_stripe()
calls.

So relax taking the dev_replace rwsem to only protect both cases and check
if the device replace status has changed in the meantime, for which we have
to re-do the find_live_mirror() calls.

This fixes a deadlock on raid-stripe-tree where device replace performs a
scrub operation, which in turn calls into btrfs_map_block() to find the
physical location of the block.

Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Qu Wenruo <[email protected]>
Reviewed-by: Filipe Manana <[email protected]>
Signed-off-by: Johannes Thumshirn <[email protected]>
Signed-off-by: David Sterba <[email protected]>
  • Loading branch information
morbidrsa authored and kdave committed Jul 18, 2024
1 parent 6a2611b commit 15cecde
Showing 1 changed file with 18 additions and 11 deletions.
29 changes: 18 additions & 11 deletions fs/btrfs/volumes.c
Original file line number Diff line number Diff line change
Expand Up @@ -6650,14 +6650,9 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
max_len = btrfs_max_io_len(map, map_offset, &io_geom);
*length = min_t(u64, map->chunk_len - map_offset, max_len);

again:
down_read(&dev_replace->rwsem);
dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
/*
* Hold the semaphore for read during the whole operation, write is
* requested at commit time but must wait.
*/
if (!dev_replace_is_ongoing)
up_read(&dev_replace->rwsem);

switch (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
case BTRFS_BLOCK_GROUP_RAID0:
Expand Down Expand Up @@ -6695,6 +6690,7 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
"stripe index math went horribly wrong, got stripe_index=%u, num_stripes=%u",
io_geom.stripe_index, map->num_stripes);
ret = -EINVAL;
up_read(&dev_replace->rwsem);
goto out;
}

Expand All @@ -6710,6 +6706,8 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
*/
num_alloc_stripes += 2;

up_read(&dev_replace->rwsem);

/*
* If this I/O maps to a single device, try to return the device and
* physical block information on the stack instead of allocating an
Expand Down Expand Up @@ -6782,25 +6780,34 @@ int btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
goto out;
}

/*
* Check if something changed the dev_replace state since
* we've checked it for the last time and if redo the whole
* mapping operation.
*/
down_read(&dev_replace->rwsem);
if (dev_replace_is_ongoing !=
btrfs_dev_replace_is_ongoing(dev_replace)) {
btrfs_put_bioc(bioc);
up_read(&dev_replace->rwsem);
goto again;
}

if (op != BTRFS_MAP_READ)
io_geom.max_errors = btrfs_chunk_max_errors(map);

if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL &&
op != BTRFS_MAP_READ) {
handle_ops_on_dev_replace(bioc, dev_replace, logical, &io_geom);
}
up_read(&dev_replace->rwsem);

*bioc_ret = bioc;
bioc->num_stripes = io_geom.num_stripes;
bioc->max_errors = io_geom.max_errors;
bioc->mirror_num = io_geom.mirror_num;

out:
if (dev_replace_is_ongoing) {
lockdep_assert_held(&dev_replace->rwsem);
/* Unlock and let waiting writers proceed */
up_read(&dev_replace->rwsem);
}
btrfs_free_chunk_map(map);
return ret;
}
Expand Down

0 comments on commit 15cecde

Please sign in to comment.