Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot): migrate test_replica_rebuild_per_volume_limit #2159

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions e2e/keywords/volume.resource
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,10 @@ Delete volume ${volume_id} replica on ${replica_locality}
${volume_name} = generate_name_with_suffix volume ${volume_id}
delete_replica_on_node ${volume_name} ${replica_locality}

Delete ${count} replicas of volume ${volume_id}
${volume_name} = generate_name_with_suffix volume ${volume_id}
delete_replicas ${volume_name} ${count}

Wait for volume ${volume_id} healthy
${volume_name} = generate_name_with_suffix volume ${volume_id}
wait_for_volume_healthy ${volume_name}
Expand Down Expand Up @@ -178,6 +182,10 @@ Wait until volume ${volume_id} replicas rebuilding completed
${volume_name} = generate_name_with_suffix volume ${volume_id}
wait_for_replica_rebuilding_to_complete ${volume_name}

Monitor only one replica rebuilding will start at a time for volume ${volume_id}
${volume_name} = generate_name_with_suffix volume ${volume_id}
wait_for_replica_rebuilding_to_complete ${volume_name}

Comment on lines +185 to +188
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Keyword implementation may not match its intended purpose.

The keyword name suggests monitoring that only one replica rebuilds at a time, but the implementation only waits for rebuilding completion. This might not effectively validate the "one at a time" constraint.

Consider implementing proper monitoring logic:

 Monitor only one replica rebuilding will start at a time for volume ${volume_id}
     ${volume_name} =    generate_name_with_suffix    volume    ${volume_id}
-    wait_for_replica_rebuilding_to_complete   ${volume_name}
+    only_one_replica_rebuilding_will_start_at_a_time    ${volume_name}

Committable suggestion skipped: line range outside the PR's diff.

Wait until volume ${volume_id} replica rebuilding stopped on ${replica_locality}
${volume_name} = generate_name_with_suffix volume ${volume_id}
wait_for_replica_rebuilding_to_stop_on_node ${volume_name} ${replica_locality}
Expand All @@ -192,6 +200,10 @@ Both volume ${volume_id_0} and volume ${volume_id_1} replica rebuilding on ${rep
${volume_name_1} = generate_name_with_suffix volume ${volume_id_1}
both_replica_rebuildings_will_start_at_the_same_time_on_node ${volume_name_0} ${volume_name_1} ${replica_locality}

Only one replica rebuilding will start at a time for volume ${volume_id}
${volume_name} = generate_name_with_suffix volume ${volume_id}
only_one_replica_rebuilding_will_start_at_a_time ${volume_name}

Crash volume ${volume_id} replica processes
${volume_name} = generate_name_with_suffix volume ${volume_id}
crash_replica_processes ${volume_name}
Expand Down
28 changes: 23 additions & 5 deletions e2e/libs/keywords/volume_keywords.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,13 @@ def delete_replica_on_nodes(self, volume_name, replica_locality):
logging(f"Deleting volume {volume_name}'s replica on node {node_id}")
self.volume.delete_replica(volume_name, node_id)

def delete_replicas(self, volume_name, count):
replica_list = self.replica.get(volume_name, node_name="")
replica_names = [replica['metadata']['name'] for replica in replica_list]
for i in range(int(count)):
logging(f"Deleting volume {volume_name} replica volume {replica_names[i]}")
self.volume.delete_replica_by_name(volume_name, replica_names[i])

Comment on lines +131 to +137
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation and error handling.

The method needs additional safeguards:

  1. Validate that count is a positive integer and doesn't exceed available replicas
  2. Add error handling for the delete operation

Consider this implementation:

 def delete_replicas(self, volume_name, count):
+    try:
+        count = int(count)
+        if count <= 0:
+            raise ValueError("Count must be positive")
+
         replica_list = self.replica.get(volume_name, node_name="")
         replica_names = [replica['metadata']['name'] for replica in replica_list]
+        if count > len(replica_names):
+            raise ValueError(f"Cannot delete {count} replicas, only {len(replica_names)} available")
+
         for i in range(int(count)):
             logging(f"Deleting volume {volume_name} replica volume {replica_names[i]}")
-            self.volume.delete_replica_by_name(volume_name, replica_names[i])
+            try:
+                self.volume.delete_replica_by_name(volume_name, replica_names[i])
+            except Exception as e:
+                logging(f"Failed to delete replica {replica_names[i]}: {str(e)}")
+                raise
+    except ValueError as e:
+        logging(f"Invalid input: {str(e)}")
+        raise
+    except Exception as e:
+        logging(f"Failed to delete replicas: {str(e)}")
+        raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def delete_replicas(self, volume_name, count):
replica_list = self.replica.get(volume_name, node_name="")
replica_names = [replica['metadata']['name'] for replica in replica_list]
for i in range(int(count)):
logging(f"Deleting volume {volume_name} replica volume {replica_names[i]}")
self.volume.delete_replica_by_name(volume_name, replica_names[i])
def delete_replicas(self, volume_name, count):
try:
count = int(count)
if count <= 0:
raise ValueError("Count must be positive")
replica_list = self.replica.get(volume_name, node_name="")
replica_names = [replica['metadata']['name'] for replica in replica_list]
if count > len(replica_names):
raise ValueError(f"Cannot delete {count} replicas, only {len(replica_names)} available")
for i in range(int(count)):
logging(f"Deleting volume {volume_name} replica volume {replica_names[i]}")
try:
self.volume.delete_replica_by_name(volume_name, replica_names[i])
except Exception as e:
logging(f"Failed to delete replica {replica_names[i]}: {str(e)}")
raise
except ValueError as e:
logging(f"Invalid input: {str(e)}")
raise
except Exception as e:
logging(f"Failed to delete replicas: {str(e)}")
raise

def set_annotation(self, volume_name, annotation_key, annotation_value):
self.volume.set_annotation(volume_name, annotation_key, annotation_value)

Expand All @@ -152,11 +159,7 @@ def wait_for_replica_rebuilding_to_complete_on_node(self, volume_name, replica_l
self.volume.wait_for_replica_rebuilding_complete(volume_name, node_name)

def wait_for_replica_rebuilding_to_complete(self, volume_name):
for node_name in self.node.list_node_names_by_role("worker"):
if self.node.is_node_schedulable(node_name) == "False":
continue
logging(f"Waiting for volume {volume_name}'s replica on node {node_name} rebuilding completed")
self.volume.wait_for_replica_rebuilding_complete(volume_name, node_name)
self.volume.wait_for_replica_rebuilding_complete(volume_name)

async def only_one_replica_rebuilding_will_start_at_a_time_on_node(self, volume_name_0, volume_name_1, replica_locality):

Expand Down Expand Up @@ -203,6 +206,21 @@ async def wait_for_both_replica_rebuildings():
assert self.volume.is_replica_rebuilding_in_progress(volume_name_0, node_id) and self.volume.is_replica_rebuilding_in_progress(volume_name_1, node_id), \
f"Expect {volume_name_0} and {volume_name_1} replica rebuilding at the same time"

async def only_one_replica_rebuilding_will_start_at_a_time(self, volume_name):

async def wait_for_replica_rebuilding():
tasks = [
asyncio.create_task(self.volume.wait_for_replica_rebuilding_start(volume_name), name=volume_name),
]

done, pending = await asyncio.wait(tasks, return_when=asyncio.ALL_COMPLETED)
logging(f"Observed {done.pop().get_name()} started replica rebuilding")

await wait_for_replica_rebuilding()

assert self.volume.is_replica_rebuilding_in_progress(volume_name), \
f"Expect {volume_name} replica rebuilding in progress"

def crash_replica_processes(self, volume_name):
self.volume.crash_replica_processes(volume_name)

Expand Down
19 changes: 18 additions & 1 deletion e2e/libs/volume/crd.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,23 @@ def delete_replica(self, volume_name, node_name):
name=replica_list['items'][0]['metadata']['name']
)

def delete_replica_by_name(self, volume_name, replica_name):
replica = self.obj_api.get_namespaced_custom_object(
group="longhorn.io",
version="v1beta2",
namespace="longhorn-system",
plural="replicas",
name=replica_name
)
logging(f"Deleting replica {replica['metadata']['name']}")
self.obj_api.delete_namespaced_custom_object(
group="longhorn.io",
version="v1beta2",
namespace="longhorn-system",
plural="replicas",
name=replica['metadata']['name']
)

Comment on lines +423 to +439
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling and optimize replica deletion.

The implementation could be improved in the following ways:

  1. Add error handling for non-existent replicas
  2. Remove redundant metadata access since we already have the replica name

Consider this implementation:

 def delete_replica_by_name(self, volume_name, replica_name):
-    replica = self.obj_api.get_namespaced_custom_object(
-        group="longhorn.io",
-        version="v1beta2",
-        namespace="longhorn-system",
-        plural="replicas",
-        name=replica_name
-    )
-    logging(f"Deleting replica {replica['metadata']['name']}")
-    self.obj_api.delete_namespaced_custom_object(
-        group="longhorn.io",
-        version="v1beta2",
-        namespace="longhorn-system",
-        plural="replicas",
-        name=replica['metadata']['name']
-    )
+    try:
+        logging(f"Deleting replica {replica_name}")
+        self.obj_api.delete_namespaced_custom_object(
+            group="longhorn.io",
+            version="v1beta2",
+            namespace="longhorn-system",
+            plural="replicas",
+            name=replica_name
+        )
+    except ApiException as e:
+        if e.status == 404:
+            logging(f"Replica {replica_name} not found")
+        else:
+            raise e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def delete_replica_by_name(self, volume_name, replica_name):
replica = self.obj_api.get_namespaced_custom_object(
group="longhorn.io",
version="v1beta2",
namespace="longhorn-system",
plural="replicas",
name=replica_name
)
logging(f"Deleting replica {replica['metadata']['name']}")
self.obj_api.delete_namespaced_custom_object(
group="longhorn.io",
version="v1beta2",
namespace="longhorn-system",
plural="replicas",
name=replica['metadata']['name']
)
def delete_replica_by_name(self, volume_name, replica_name):
try:
logging(f"Deleting replica {replica_name}")
self.obj_api.delete_namespaced_custom_object(
group="longhorn.io",
version="v1beta2",
namespace="longhorn-system",
plural="replicas",
name=replica_name
)
except ApiException as e:
if e.status == 404:
logging(f"Replica {replica_name} not found")
else:
raise e

def wait_for_replica_rebuilding_start(self, volume_name, node_name):
return Rest().wait_for_replica_rebuilding_start(volume_name, node_name)

Expand All @@ -432,7 +449,7 @@ def crash_replica_processes(self, volume_name):
def crash_node_replica_process(self, volume_name, node_name):
return Rest().crash_node_replica_process(volume_name, node_name)

def wait_for_replica_rebuilding_complete(self, volume_name, node_name):
def wait_for_replica_rebuilding_complete(self, volume_name, node_name=None):
return Rest().wait_for_replica_rebuilding_complete(volume_name, node_name)

def check_data_checksum(self, volume_name, data_id):
Expand Down
69 changes: 50 additions & 19 deletions e2e/libs/volume/rest.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,29 +118,42 @@ def keep_writing_data(self, volume_name, size):
def delete_replica(self, volume_name, node_name):
return NotImplemented

async def wait_for_replica_rebuilding_start(self, volume_name, node_name):
async def wait_for_replica_rebuilding_start(self, volume_name, node_name=None):
rebuilding_replica_name = None
for i in range(self.retry_count):
try:
v = get_longhorn_client().by_id_volume(volume_name)
logging(f"Trying to get volume {volume_name} rebuilding replicas ... ({i})")
for replica in v.replicas:
if replica.hostId == node_name:
if node_name and replica.hostId == node_name and replica.mode == "WO":
rebuilding_replica_name = replica.name
break
elif replica.mode == "WO":
rebuilding_replica_name = replica.name
node_name = replica.hostId
break
if rebuilding_replica_name:
break
except Exception as e:
logging(f"Failed to get volume {volume_name} with error: {e}")
await asyncio.sleep(self.retry_interval)
assert rebuilding_replica_name != None
assert rebuilding_replica_name != None, f"Waiting for replica rebuilding start for volume {volume_name} on node {node_name} failed: replicas = {v.replicas}"
logging(f"Got volume {volume_name} rebuilding replica = {rebuilding_replica_name} on node {node_name}")

started = False
for i in range(self.retry_count):
try:
v = get_longhorn_client().by_id_volume(volume_name)
logging(f"Got volume {volume_name} rebuild status = {v.rebuildStatus}")

# During monitoring replica rebuilding
# at the same time monitoring if there are unexpected concurrent replica rebuilding
rebuilding_count = 0
for replica in v.replicas:
if replica.mode == "WO":
rebuilding_count +=1
assert rebuilding_count <= 1, f"Unexpected concurrent replica rebuilding = {rebuilding_count}, replicas = {v.replicas}"

for status in v.rebuildStatus:
for replica in v.replicas:
if status.replica == replica.name and \
Expand All @@ -156,7 +169,7 @@ async def wait_for_replica_rebuilding_start(self, volume_name, node_name):
await asyncio.sleep(self.retry_interval)
assert started, f"wait for replica on node {node_name} rebuilding timeout: {v}"

def is_replica_rebuilding_in_progress(self, volume_name, node_name):
def is_replica_rebuilding_in_progress(self, volume_name, node_name=None):
in_progress = False
for i in range(self.retry_count):
try:
Expand All @@ -165,8 +178,9 @@ def is_replica_rebuilding_in_progress(self, volume_name, node_name):
for status in v.rebuildStatus:
for replica in v.replicas:
if status.replica == replica.name and \
replica.hostId == node_name and \
(node_name is None or replica.hostId == node_name) and \
status.state == "in_progress":
node_name = replica.hostId if not node_name else node_name
logging(f"Volume {volume_name} replica rebuilding {replica.name} in progress on {node_name}")
in_progress = True
break
Expand Down Expand Up @@ -217,31 +231,48 @@ def get_replica_name_on_node(self, volume_name, node_name):
if r.hostId == node_name:
return r.name

def wait_for_replica_rebuilding_complete(self, volume_name, node_name):
def wait_for_replica_rebuilding_complete(self, volume_name, node_name=None):
completed = False
for i in range(self.retry_count):
logging(f"wait for {volume_name} replica rebuilding completed on {node_name} ... ({i})")
logging(f"wait for {volume_name} replica rebuilding completed on {'all nodes' if not node_name else node_name} ... ({i})")
try:
v = get_longhorn_client().by_id_volume(volume_name)

# During monitoring replica rebuilding
# at the same time monitoring if there are unexpected concurrent replica rebuilding
rebuilding_count = 0
for replica in v.replicas:
# use replica.mode is RW or RO to check if this replica
# has been rebuilt or not
# because rebuildStatus is not reliable
# when the rebuild progress reaches 100%
# it will be removed from rebuildStatus immediately
# and you will just get an empty rebuildStatus []
# so it's no way to distinguish "rebuilding not started yet"
# or "rebuilding already completed" using rebuildStatus
if replica.hostId == node_name and replica.mode == "RW":
if replica.mode == "WO":
rebuilding_count +=1
assert rebuilding_count <= 1, f"Unexpected concurrent replica rebuilding = {rebuilding_count}, replicas = {v.replicas}"

if node_name:
for replica in v.replicas:
# use replica.mode is RW or RO to check if this replica
# has been rebuilt or not
# because rebuildStatus is not reliable
# when the rebuild progress reaches 100%
# it will be removed from rebuildStatus immediately
# and you will just get an empty rebuildStatus []
# so it's no way to distinguish "rebuilding not started yet"
# or "rebuilding already completed" using rebuildStatus
if replica.hostId == node_name and replica.mode == "RW":
completed = True
break
else:
rw_replica_count = 0
for replica in v.replicas:
if replica.mode == "RW":
rw_replica_count += 1
if rw_replica_count == v.numberOfReplicas:
completed = True
break
if completed:
break
except Exception as e:
logging(f"Failed to get volume {volume_name} with error: {e}")
time.sleep(self.retry_interval)
logging(f"Completed volume {volume_name} replica rebuilding on {node_name}")
assert completed, f"Expect volume {volume_name} replica rebuilding completed on {node_name}"
logging(f"Completed volume {volume_name} replica rebuilding on {'all nodes' if not node_name else node_name}")
assert completed, f"Expect volume {volume_name} replica rebuilding completed on {'all nodes' if not node_name else node_name}"

def check_data_checksum(self, volume_name, data_id):
return NotImplemented
Expand Down
9 changes: 6 additions & 3 deletions e2e/libs/volume/volume.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,13 @@ def keep_writing_data(self, volume_name):
def delete_replica(self, volume_name, node_name):
return self.volume.delete_replica(volume_name, node_name)

def wait_for_replica_rebuilding_start(self, volume_name, node_name):
def delete_replica_by_name(self, volume_name, replica_name):
return self.volume.delete_replica_by_name(volume_name, replica_name)

def wait_for_replica_rebuilding_start(self, volume_name, node_name=None):
return self.volume.wait_for_replica_rebuilding_start(volume_name, node_name)

def is_replica_rebuilding_in_progress(self, volume_name, node_name):
def is_replica_rebuilding_in_progress(self, volume_name, node_name=None):
return self.volume.is_replica_rebuilding_in_progress(volume_name, node_name)

def crash_replica_processes(self, volume_name):
Expand All @@ -122,7 +125,7 @@ def wait_for_replica_running(self, volume_name, node_name):
def get_replica_name_on_node(self, volume_name, node_name):
return self.volume.get_replica_name_on_node(volume_name, node_name)

def wait_for_replica_rebuilding_complete(self, volume_name, node_name):
def wait_for_replica_rebuilding_complete(self, volume_name, node_name=None):
return self.volume.wait_for_replica_rebuilding_complete(volume_name, node_name)
Comment on lines +128 to 129
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Inconsistency found in node_name parameter handling

The verification reveals inconsistencies in the node_name parameter handling across implementations:

  • base.py: All methods declare node_name as required parameter
  • rest.py: All methods correctly implement node_name=None as optional
  • crd.py: Only wait_for_replica_rebuilding_complete has node_name=None, while other rebuilding methods have required node_name
  • volume.py: All methods correctly implement node_name=None as optional

The base class and CRD implementation need to be updated to maintain consistency with the optional node_name parameter pattern.

🔗 Analysis chain

Add docstring and verify implementation consistency.

Add a docstring to explain the behavior with and without node_name parameter.

     def wait_for_replica_rebuilding_complete(self, volume_name, node_name=None):
+        """Wait for replica rebuilding to complete.
+        
+        Args:
+            volume_name: Name of the volume
+            node_name: Optional. If provided, wait for rebuilding completion on specific node.
+                      If None, wait for any replica rebuilding completion.
+        """
         return self.volume.wait_for_replica_rebuilding_complete(volume_name, node_name)

Let's verify that the underlying implementations in CRD and Rest classes handle the optional node_name consistently:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if CRD and Rest implementations handle optional node_name consistently

# Test: Search for the implementation of these methods in CRD and Rest classes
echo "Checking implementations in CRD and Rest classes..."
rg -A 10 "def (wait_for_replica_rebuilding_start|is_replica_rebuilding_in_progress|wait_for_replica_rebuilding_complete)" "e2e/libs/volume/"

Length of output: 7703


def check_data_checksum(self, volume_name, data_id):
Expand Down
51 changes: 51 additions & 0 deletions e2e/tests/regression/test_replica.robot
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
*** Settings ***
Documentation Replica Test Cases

Test Tags regression

Resource ../keywords/common.resource
Resource ../keywords/volume.resource
Resource ../keywords/setting.resource
Resource ../keywords/deployment.resource
Resource ../keywords/persistentvolumeclaim.resource
Resource ../keywords/workload.resource

Test Setup Set test environment
Test Teardown Cleanup test resources

*** Variables ***
${LOOP_COUNT} 1
${RETRY_COUNT} 300
${RETRY_INTERVAL} 1
${DATA_ENGINE} v1

*** Test Cases ***
Test Replica Rebuilding Per Volume Limit
[Tags] coretest
[Documentation] Test the volume always only have one replica scheduled for rebuild
...
... 1. Set soft anti-affinity to `true`.
... 2. Create a volume with 1 replica.
... 3. Attach the volume and write a few hundreds MB data to it.
Comment on lines +27 to +29
Copy link

@coderabbitai coderabbitai bot Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Documentation inconsistency with implementation.

The documentation states "Create a volume with 1 replica" but the implementation creates a volume with 3 replicas.

Either update the documentation or adjust the implementation to match:

-    ...    2. Create a volume with 1 replica.
+    ...    2. Create a volume with 3 replicas.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
... 1. Set soft anti-affinity to `true`.
... 2. Create a volume with 1 replica.
... 3. Attach the volume and write a few hundreds MB data to it.
... 1. Set soft anti-affinity to `true`.
... 2. Create a volume with 3 replicas.
... 3. Attach the volume and write a few hundreds MB data to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this comment for making test description and test code more consistent.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriscchien, glad to hear that you agree. Updating the documentation step to match the implementation will ensure consistency. Here's the updated step:

    ...    2. Create a volume with 3 replicas.

Let me know if you need any further assistance.

... 4. Scale the volume replica to 5.
... 5. Monitor the volume replica list to make sure there should be only 1 replica in WO state.
... 6. Wait for the volume to complete rebuilding. Then remove 4 of the 5 replicas.
... 7. Monitoring the volume replica list again.
... 8. Once the rebuild was completed again, verify the data checksum.
Given Set setting replica-soft-anti-affinity to true
And Create volume 0 with numberOfReplicas=3 dataEngine=${DATA_ENGINE}
And Attach volume 0
And Wait for volume 0 healthy
And Write data to volume 0

Comment on lines +39 to +40
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for data write operation.

The test writes data but doesn't verify the write operation's success before proceeding with replica operations.

 And Write data to volume 0
+And Verify data write succeeded on volume 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
And Write data to volume 0
And Write data to volume 0
And Verify data write succeeded on volume 0

When Update volume 0 replica count to 5
Then Only one replica rebuilding will start at a time for volume 0
And Monitor only one replica rebuilding will start at a time for volume 0
And Wait until volume 0 replicas rebuilding completed
Copy link
Contributor

@chriscchien chriscchien Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to change the keyword name Wait until volume 0 replicas rebuilding is completed to emphasize that it ensures only one replica in the rebuild (WO) state until the rebuild is completed, for better match test case name Test Replica Rebuilding Per Volume Limit? Beside this, no further questions. Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriscchien Could you suggest an appropriate name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Only one replica for volume 0 rebuilds at a time until the rebuild is completed for below two?

 Then Only one replica rebuilding will start at a time for volume 0
 And Wait until volume 0 replicas rebuilding completed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriscchien How about adding another step Monitor only one replica rebuilding will start at a time for volume 0:

Then Only one replica rebuilding will start at a time for volume 0
And Monitor only one replica rebuilding will start at a time for volume 0
And Wait until volume 0 replicas rebuilding completed

This aligns with the description in the test skeleton.


When Delete 4 replicas of volume 0
Then Only one replica rebuilding will start at a time for volume 0
And Monitor only one replica rebuilding will start at a time for volume 0
And Wait until volume 0 replicas rebuilding completed
And Wait for volume 0 healthy
And Check volume 0 data is intact