replication received_uuid blocker re snap to share promotion #2902

Hooverdan96 · 2024-09-20T16:45:11Z

As observed in the scenario described on the Rockstor community forum (users stevek, Hooverdan, phillxnet), when quotas are NOT enabled on the receiving system, it can happen that a snapshot cannot be promoted because the system fails to set the read-write (rw) property. In this scenario the receiving system was running Rockstor on OpenSUSE Tumbleweed.

https://forum.rockstor.com/t/disk-structure-under-mnt2-and-replication-question/9720/21

The resulting error message implies that using the -f (force) flag will allow the property setting.

ERROR [storageadmin.util:44] Exception: Error running a command. cmd = /usr/sbin/btrfs property set /mnt2/fresse_storage/.snapshots/6f32cb58-f849-4c93-bc65-6ebda422c66d_Replication/Replication_6_replication_1 ro false. rc = 1. stdout = ['']. stderr = ['ERROR: cannot flip ro->rw with received_uuid set, use force option -f if you really want unset the read-only status. The value of received_uuid is used for incremental send, consider making a snapshot instead. Read more at btrfs-subvolume(8) and Subvolume flags.', '']`

[EDIT by phillxnet] A dependency regarding reproducer systems: believed to pertain to Leap 15.6 / TW receiver side systems. Where a jump in kernel and btrfs was observed: container newer safe-guards that have lead to this -f requirement. See now associated and merged PR referenced below in comments.

The text was updated successfully, but these errors were encountered:

Hooverdan96 · 2024-09-20T16:48:28Z

In the same forum thread, I documented a PoC with the suggested change:

N.B. quotas are disabled, otherwise this error can be masked by #2901.

changing:

rockstor-core/src/rockstor/fs/btrfs.py

Lines 2311 to 2314 in 1ddcf4b

    
           def set_property(mnt_pt, name, val, mount=True): 
        
               if mount is not True or is_mounted(mnt_pt): 
        
                   cmd = [BTRFS, "property", "set", mnt_pt, name, val] 
        
                   return run_command(cmd)

to:

def set_property(mnt_pt, name, val, mount=True):
    if mount is not True or is_mounted(mnt_pt):
        cmd = [BTRFS, "property", "set", "-f", mnt_pt, name, val]
        return run_command(cmd)

which resulted in successful replications beyond the usual failure point

phillxnet · 2024-10-01T10:48:38Z

N.B. I have now observed this failure with quotas enabled (on receiving system):

Reproduced with Rockstor 5.0.14-0 Leap 15.6 send & receive instances:

first 3 replication events function as expected.
4th replication event fails as indicated: i.e. during initial snapshot (oldest of now 3) to share promotion.

[01/Oct/2024 11:25:04] INFO [storageadmin.views.snapshot:61] Supplanting share (67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01) with snapshot (.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1).
[01/Oct/2024 11:25:04] ERROR [storageadmin.util:44] Exception: Error running a command. cmd = /usr/sbin/btrfs property set /mnt2/rock-pool/.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1 ro false. rc = 1. stdout = ['']. stderr = ['ERROR: cannot flip ro->rw with received_uuid set, use force if you really want that', '']
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/storageadmin/views/clone_helpers.py", line 94, in create_repclone
    set_property(snap_path, "ro", "false", mount=False)
  File "/opt/rockstor/src/rockstor/fs/btrfs.py", line 2314, in set_property
    return run_command(cmd)
           ^^^^^^^^^^^^^^^^
  File "/opt/rockstor/src/rockstor/system/osi.py", line 289, in run_command
    raise CommandException(cmd, out, err, rc)
system.exceptions.CommandException: Error running a command. cmd = /usr/sbin/btrfs property set /mnt2/rock-pool/.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1 ro false. rc = 1. stdout = ['']. stderr = ['ERROR: cannot flip ro->rw with received_uuid set, use force if you really want that', '']
[01/Oct/2024 11:25:04] ERROR [smart_manager.replication.receiver:100] b'Failed to promote the oldest Snapshot to Share.'. Exception: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/api/shares/16/snapshots/test_share01_1_replication_1/repclone

With the following qgroup details (receiving system):

rleap15-6:~ # btrfs qgroup show /mnt2/rock-pool/ | grep snapshot
...
0/694         16.00KiB     16.00KiB   .snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1
0/695         16.00KiB     16.00KiB   .snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_2
0/696         16.00KiB     16.00KiB   .snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_3

N.B. in this reproducer instance there is no 2015 (rockstor) parent qgroup assignment. Only that of the default 0 group.

phillxnet · 2024-10-01T10:52:06Z

@Hooverdan96 My previous comment reproducer details were observed with a trivial data set. This may well explain seeing this error while quotas are enabled. I'll continue with this issue while I have a reproducer and then look to the quotas related blocker that likely proceeds this issue when there is an actual real-life data payload.

phillxnet · 2024-10-01T10:56:44Z

Likely pertinent historical reference from btrfs mailing list: https://www.spinics.net/lists/linux-btrfs/msg69951.html

phillxnet · 2024-10-01T14:24:26Z

Notes on first 3 replication received subvol properties:
Installer sending system -> rleap15-6 receiving system

1st

Send end

No longer available in reproducer systems as the oldest snapshot in replication is deleted.

Receive end

N.B. As this is the first replication event: this subvol has no parent.

rleap15-6:~ # btrfs subvol show /mnt2/rock-pool/.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1
.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_1
        Name:                   test_share01_1_replication_1
        UUID:                   1e35ce1a-de98-6749-9dd7-6acb3dc85ee5
        Parent UUID:            -
        Received UUID:          2f87835e-d4d1-774b-a606-0f4e8763b41a
        Creation time:          2024-10-01 11:10:03 +0100
        Subvolume ID:           694
        Generation:             4921
        Gen at creation:        4917
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           23
        Send time:              2024-10-01 11:10:03 +0100
        Receive transid:        4918
        Receive time:           2024-10-01 11:10:03 +0100
        Snapshot(s):
                                .snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_2
        Quota group:            0/694
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

2nd:

Send end

installer:~ # btrfs subvolume show /mnt2/raid-test/.snapshots/test_share01/test_share01_1_replication_2
.snapshots/test_share01/test_share01_1_replication_2
        Name:                   test_share01_1_replication_2
        UUID:                   7acc89a7-e758-3e4c-b6f0-4e9ae3c7358b
        Parent UUID:            6033150e-c572-3e49-aeb9-94ae1c915163
        Received UUID:          -
        Creation time:          2024-10-01 11:15:04 +0100
        Subvolume ID:           258
        Generation:             53
        Gen at creation:        53
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           0
        Send time:              2024-10-01 11:15:04 +0100
        Receive transid:        0
        Receive time:           -
        Snapshot(s):
        Quota group:            0/258
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

Receive end

N.B. This subvol has the first (1st above) as it's parent UUID subvol. Send receive working on sending differences between two subvols.

rleap15-6:~ # btrfs subvol show /mnt2/rock-pool/.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_2
.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_2
        Name:                   test_share01_1_replication_2
        UUID:                   26e85e61-5ba2-4240-b6a7-d75a76ec77bc
        Parent UUID:            1e35ce1a-de98-6749-9dd7-6acb3dc85ee5
        Received UUID:          7acc89a7-e758-3e4c-b6f0-4e9ae3c7358b
        Creation time:          2024-10-01 11:15:04 +0100
        Subvolume ID:           695
        Generation:             4924
        Gen at creation:        4921
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           23
        Send time:              2024-10-01 11:15:04 +0100
        Receive transid:        4922
        Receive time:           2024-10-01 11:15:04 +0100
        Snapshot(s):
                                .snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_3
        Quota group:            0/695
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

3rd

Send end

installer:~ # btrfs subvolume show /mnt2/raid-test/.snapshots/test_share01/test_share01_1_replication_3
.snapshots/test_share01/test_share01_1_replication_3
        Name:                   test_share01_1_replication_3
        UUID:                   6cbbc48f-2d23-3a42-b09d-2f5504ebb4cf
        Parent UUID:            6033150e-c572-3e49-aeb9-94ae1c915163
        Received UUID:          -
        Creation time:          2024-10-01 11:20:03 +0100
        Subvolume ID:           259
        Generation:             55
        Gen at creation:        55
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           0
        Send time:              2024-10-01 11:20:03 +0100
        Receive transid:        0
        Receive time:           -
        Snapshot(s):
        Quota group:            0/259
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

Receive end

N.B. In turn, this 3rd subvol has as its parent UUID the above 2nd subvol.
And its Received UUID as the 3rd (sender end) snapshot above.

rleap15-6:~ # btrfs subvol show /mnt2/rock-pool/.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_3
.snapshots/67bdf5bd-2c16-41d7-8224-ca864f2c0a68_test_share01/test_share01_1_replication_3
        Name:                   test_share01_1_replication_3
        UUID:                   2870b9a3-1a85-694a-99cf-b045160b5a43
        Parent UUID:            26e85e61-5ba2-4240-b6a7-d75a76ec77bc
        Received UUID:          6cbbc48f-2d23-3a42-b09d-2f5504ebb4cf
        Creation time:          2024-10-01 11:20:04 +0100
        Subvolume ID:           696
        Generation:             4924
        Gen at creation:        4924
        Parent ID:              5
        Top level ID:           5
        Flags:                  readonly
        Send transid:           23
        Send time:              2024-10-01 11:20:04 +0100
        Receive transid:        4925
        Receive time:           2024-10-01 11:20:04 +0100
        Snapshot(s):
        Quota group:            0/696
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

Original (Sending) source share info

Having stopped the sending replication: to catch the final state of this replication failure reproducer we have the original source (sending side) share we were replicating showing up as follows:

installer:~ # btrfs subvolume show /mnt2/raid-test/test_share01/
test_share01
        Name:                   test_share01
        UUID:                   6033150e-c572-3e49-aeb9-94ae1c915163
        Parent UUID:            -
        Received UUID:          -
        Creation time:          2024-05-28 17:44:48 +0100
        Subvolume ID:           256
        Generation:             57
        Gen at creation:        11
        Parent ID:              5
        Top level ID:           5
        Flags:                  -
        Send transid:           0
        Send time:              2024-05-28 17:44:48 +0100
        Receive transid:        0
        Receive time:           -
        Snapshot(s):
                                .snapshots/test_share01/test_share01_1_replication_2
                                .snapshots/test_share01/test_share01_1_replication_3
                                .snapshots/test_share01/test_share01_1_replication_4
        Quota group:            0/256
          Limit referenced:     -
          Limit exclusive:      -
          Usage referenced:     16.00KiB
          Usage exclusive:      16.00KiB

phillxnet · 2024-10-01T14:36:19Z

@Hooverdan96 I'm just working through our options here, but remember that we already make allowances for our approach, I.e. the cascade of snapshots. We purposefully do not touch a 'live' receiving snapshot. And such a change is way too large for this late in the testing phase. But our code is such that we could look to improvements later. But not just yet I think. Still working on this one. But we do already account for this sensitivity: we were just not actually warned against what we do before hand. And that warning pertains to if the subvol we are modifyting was still involved in a send/receive. My understanding is that is is not: due to our precautions re the cascade sends.

phillxnet · 2024-10-01T15:08:43Z

@Hooverdan96 Also note that a clone in btrfs speak is a little different to our clones. Here, as far as my understanding goes, we already maintain upstream advice via our snapshot cascade: and sending the differences. We send differences between ro snapshots only. The cascade then allows for us to do our 'repclone' (snap-to-share-supplant) which is to supplant a share with a snapshot. There-by updating the user-visible replication share. A snapshot is actually a clone (mostly instantaneous), and we already do this as part of our send/receive wrapper. It's where all the complexity comes from: and the purpose of our cascade in the first place. Incidentally we use to use 5 snapshot !!! But I changed it to 3 a few years ago. 5 really tended to confuse folks and could take a very long time to end up with results folks expected: an actual Share at the receiving end :) .

We will have to enact some good technical docs for this whole process as I have to re-learn each time I look at it. But I think we have a good design of our own: it's just poorly documented for both us and the general users! Pretty sure we are good to go with your suggested force here: and didn't see a reference for removing a sending uuid.

…#2902 When promoting the oldest of the 3 read-only snapshots received & retained by the replication service (btrfs send/receive wrapper), use the force flag during ro-to-rw/snap-to-share transition. At the time of this transition, this received subvol is no longer used for comparison in all future replication (btrfs send/receive) events. It represents an older version of the sending systems associated replication source share. Necessarily older by way of the constraints of the btrfs send/receive architecture, and the safeguards of the replication wrapper: a cascade of ro snapshots.

Hooverdan96 · 2024-10-01T18:03:59Z

Yes, that explanation makes sense in the cloning context. And the point being that the third of these cascading snapshots will not be changed in between setting the read-write flag and it being promoted to share.

…d-blocker-re-snap-to-share-promotion replication received_uuid blocker re snap to share promotion #2902

phillxnet · 2024-10-03T09:56:12Z

Closing as:
Fixed by #2911

Hooverdan96 changed the title ~~btrfs send/receive failure due to not being able to set read-write (rw) property when promoting snapshot during receive~~ btrfs send/receive - Failure due to not being able to set read-write (rw) property when promoting snapshot during receive Sep 20, 2024

Hooverdan96 changed the title ~~btrfs send/receive - Failure due to not being able to set read-write (rw) property when promoting snapshot during receive~~ btrfs send/receive - Failure due to not being able to set read-write (rw) property when promoting snapshot during replication Sep 20, 2024

phillxnet added this to the 5.1.X-X Stable release milestone Sep 28, 2024

phillxnet self-assigned this Oct 1, 2024

phillxnet changed the title ~~btrfs send/receive - Failure due to not being able to set read-write (rw) property when promoting snapshot during replication~~ replication-received_uuid-blocker-re-snap-to-share-promotion Oct 1, 2024

phillxnet changed the title ~~replication-received_uuid-blocker-re-snap-to-share-promotion~~ replication received_uuid blocker re snap to share promotion Oct 1, 2024

phillxnet mentioned this issue Oct 1, 2024

replication received_uuid blocker re snap to share promotion #2902 #2911

Merged

phillxnet added a commit that referenced this issue Oct 3, 2024

Merge pull request #2911 from phillxnet/2902-replication-received_uui…

bbb4720

…d-blocker-re-snap-to-share-promotion replication received_uuid blocker re snap to share promotion #2902

phillxnet closed this as completed Oct 3, 2024

phillxnet mentioned this issue Oct 3, 2024

btrfs send/receive - Failure to destroy quota group can prevent promoting oldest snapshot during replication #2901

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replication received_uuid blocker re snap to share promotion #2902

replication received_uuid blocker re snap to share promotion #2902

Hooverdan96 commented Sep 20, 2024 •

edited by phillxnet

Loading

Hooverdan96 commented Sep 20, 2024 •

edited

Loading

phillxnet commented Oct 1, 2024

phillxnet commented Oct 1, 2024 •

edited

Loading

phillxnet commented Oct 1, 2024

phillxnet commented Oct 1, 2024 •

edited

Loading

phillxnet commented Oct 1, 2024

phillxnet commented Oct 1, 2024 •

edited

Loading

Hooverdan96 commented Oct 1, 2024

phillxnet commented Oct 3, 2024

replication received_uuid blocker re snap to share promotion #2902

replication received_uuid blocker re snap to share promotion #2902

Comments

Hooverdan96 commented Sep 20, 2024 • edited by phillxnet Loading

Hooverdan96 commented Sep 20, 2024 • edited Loading

phillxnet commented Oct 1, 2024

phillxnet commented Oct 1, 2024 • edited Loading

phillxnet commented Oct 1, 2024

phillxnet commented Oct 1, 2024 • edited Loading

1st

Send end

Receive end

2nd:

Send end

Receive end

3rd

Send end

Receive end

Original (Sending) source share info

phillxnet commented Oct 1, 2024

phillxnet commented Oct 1, 2024 • edited Loading

Hooverdan96 commented Oct 1, 2024

phillxnet commented Oct 3, 2024

Hooverdan96 commented Sep 20, 2024 •

edited by phillxnet

Loading

Hooverdan96 commented Sep 20, 2024 •

edited

Loading

phillxnet commented Oct 1, 2024 •

edited

Loading

phillxnet commented Oct 1, 2024 •

edited

Loading

phillxnet commented Oct 1, 2024 •

edited

Loading