Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot): migrate test_replica_rebuild_per_volume_limit #2159

Merged

Conversation

yangchiu
Copy link
Member

@yangchiu yangchiu commented Nov 8, 2024

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9774

What this PR does / why we need it:

migrate test_replica_rebuild_per_volume_limit

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced functionality to delete a specified number of replicas for a volume.
    • Added a constraint ensuring only one replica can be rebuilt at a time for a volume.
  • Bug Fixes

    • Enhanced error handling and logging for replica deletion processes.
  • Tests

    • Added a regression test case to verify the behavior of volume replicas, ensuring only one replica is rebuilt at a time.

@yangchiu yangchiu self-assigned this Nov 8, 2024
@yangchiu yangchiu requested a review from a team as a code owner November 8, 2024 03:19
Copy link

coderabbitai bot commented Nov 8, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

The changes introduce new functionalities for managing volume replicas in the system. Two new keywords are added for deleting a specified number of replicas and enforcing a constraint on replica rebuilding. Additionally, several methods in the volume_keywords, CRD, Rest, and Volume classes are updated to enhance replica management capabilities. A new regression test file is also created to validate the behavior of volume replicas, ensuring that only one replica rebuilding occurs at a time.

Changes

File Path Change Summary
e2e/keywords/volume.resource Added keywords: Delete ${count} replicas of volume ${volume_id} and Only one replica rebuilding will start at a time for volume ${volume_id}.
e2e/libs/keywords/volume_keywords.py Added methods: delete_replicas(volume_name, count) and async only_one_replica_rebuilding_will_start_at_a_time(volume_name). Simplified wait_for_replica_rebuilding_to_complete.
e2e/libs/volume/crd.py Added method: delete_replica_by_name(volume_name, replica_name). Updated wait_for_replica_rebuilding_complete to accept an optional node_name.
e2e/libs/volume/rest.py Updated methods to accept an optional node_name: wait_for_replica_rebuilding_start, is_replica_rebuilding_in_progress, wait_for_replica_rebuilding_complete.
e2e/libs/volume/volume.py Added methods: delete_replica_by_name(volume_name, replica_name) and updated several methods to accept node_name=None.
e2e/tests/regression/test_replica.robot Introduced a new test case: Test Replica Rebuilding Per Volume Limit to verify replica rebuilding behavior.

Assessment against linked issues

Objective Addressed Explanation
Migrate existing pytest test cases to robot (9774)

Possibly related PRs

Suggested reviewers

  • chriscchien

🐇 In the meadow, where replicas play,
New keywords hop in, brightening the day.
Deleting and rebuilding, all in a line,
With tests to ensure everything's fine.
So let’s cheer for the changes, a joyful delight,
For managing volumes just got more bright! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (14)
e2e/tests/regression/test_replica.robot (4)

1-15: Consider enhancing the documentation.

While the basic documentation is present, it would be beneficial to add more details about:

  • The purpose of these replica test cases
  • Prerequisites for running these tests
  • Expected test environment requirements
-Documentation    Replica Test Cases
+Documentation    Test cases validating Longhorn volume replica behavior,
+...             including replica rebuilding limits and scheduling.
+...             Prerequisites: Longhorn cluster with sufficient nodes
+...             for replica distribution.

16-20: Add documentation for test variables.

The variables lack documentation explaining their purpose and chosen values. This information is crucial for maintainability.

 *** Variables ***
+# Number of test iterations
 ${LOOP_COUNT}    1
+# Maximum number of retry attempts for operations
 ${RETRY_COUNT}    300
+# Time interval between retries in seconds
 ${RETRY_INTERVAL}    1
+# Data engine version to use for volume creation
 ${DATA_ENGINE}    v1

35-49: Add soft anti-affinity validation.

While soft anti-affinity is enabled, the test doesn't verify that replicas are actually scheduled on different nodes.

Add a verification step after replica creation:

 And Wait for volume 0 healthy
+And Verify volume 0 replicas are on different nodes
 And Write data to volume 0

23-25: Consider adding error scenarios.

While the happy path is well tested, consider adding test cases for:

  1. Node failure during replica rebuilding
  2. Network partition scenarios
  3. Disk full conditions

Would you like me to help create additional test cases for these scenarios?

e2e/libs/volume/volume.py (2)

107-108: Add docstring to clarify node_name parameter usage.

Consider adding a docstring to explain the behavior when node_name is provided vs. when it's None. This will help other developers understand how to use this method effectively.

     def wait_for_replica_rebuilding_start(self, volume_name, node_name=None):
+        """Wait for replica rebuilding to start.
+        
+        Args:
+            volume_name: Name of the volume
+            node_name: Optional. If provided, wait for rebuilding on specific node.
+                      If None, wait for any replica rebuilding.
+        """
         return self.volume.wait_for_replica_rebuilding_start(volume_name, node_name)

110-111: Add docstring to clarify node_name parameter usage.

Similar to the previous method, add a docstring to explain the behavior with and without node_name parameter.

     def is_replica_rebuilding_in_progress(self, volume_name, node_name=None):
+        """Check if replica rebuilding is in progress.
+        
+        Args:
+            volume_name: Name of the volume
+            node_name: Optional. If provided, check rebuilding on specific node.
+                      If None, check for any replica rebuilding.
+        Returns:
+            bool: True if rebuilding is in progress, False otherwise.
+        """
         return self.volume.is_replica_rebuilding_in_progress(volume_name, node_name)
e2e/keywords/volume.resource (2)

102-104: Add documentation for the new keyword.

The implementation looks good and follows the established patterns. Consider adding documentation to describe the purpose and behavior of this keyword, similar to other keywords in the file.

+Documentation    Delete specified number of replicas from a volume
 Delete ${count} replicas of volume ${volume_id}
     ${volume_name} =    generate_name_with_suffix    volume    ${volume_id}
     delete_replicas    ${volume_name}    ${count}

199-201: Add documentation for the new keyword.

The implementation looks good and follows the established patterns. Consider adding documentation to describe the purpose and behavior of this keyword, similar to other keywords in the file.

+Documentation    Verify that only one replica rebuilding process occurs at a time for the specified volume
 Only one replica rebuilding will start at a time for volume ${volume_id}
     ${volume_name} =    generate_name_with_suffix    volume    ${volume_id}
     only_one_replica_rebuilding_will_start_at_a_time    ${volume_name}
e2e/libs/keywords/volume_keywords.py (1)

209-223: Improve clarity and add timeout handling.

The implementation could be enhanced for better clarity and robustness:

Consider these improvements:

 async def only_one_replica_rebuilding_will_start_at_a_time(self, volume_name):
+    timeout = 300  # Add appropriate timeout value
 
-    async def wait_for_replica_rebuilding():
-        tasks = [
-            asyncio.create_task(self.volume.wait_for_replica_rebuilding_start(volume_name), name=volume_name),
-        ]
-
-        done, pending = await asyncio.wait(tasks, return_when=asyncio.ALL_COMPLETED)
-        logging(f"Observed {done.pop().get_name()} started replica rebuilding")
-
-    await wait_for_replica_rebuilding()
+    try:
+        logging(f"Waiting for replica rebuilding to start for volume {volume_name}")
+        await asyncio.wait_for(
+            self.volume.wait_for_replica_rebuilding_start(volume_name),
+            timeout=timeout
+        )
+        logging(f"Replica rebuilding started for volume {volume_name}")
 
-    assert self.volume.is_replica_rebuilding_in_progress(volume_name), \
-        f"Expect {volume_name} replica rebuilding in progress"
+        assert self.volume.is_replica_rebuilding_in_progress(volume_name), \
+            f"Expected volume {volume_name} to have replica rebuilding in progress, but no rebuilding was detected"
+    except asyncio.TimeoutError:
+        raise TimeoutError(f"Timeout waiting for replica rebuilding to start for volume {volume_name} after {timeout} seconds")
e2e/libs/volume/rest.py (5)

140-140: Use is not None for None comparison

In Python, it's recommended to use is not None when comparing to None for clarity and adherence to PEP 8 guidelines.

Apply this diff to correct the comparison:

-            assert rebuilding_replica_name != None, f"Waiting for replica rebuilding start for volume {volume_name} on node {node_name} failed: replicas = {v.replicas}"
+            assert rebuilding_replica_name is not None, f"Waiting for replica rebuilding start for volume {volume_name} on node {node_name} failed: replicas = {v.replicas}"
🧰 Tools
🪛 Ruff

140-140: Comparison to None should be cond is not None

Replace with cond is not None

(E711)


144-144: Replace unused loop variable i with underscore _

The loop variable i is not used within the loop body. Using _ indicates that the variable is intentionally unused.

Apply this diff to improve readability:

-            for i in range(self.retry_count):
+            for _ in range(self.retry_count):
🧰 Tools
🪛 Ruff

144-144: Loop control variable i not used within loop body

(B007)


183-183: Simplify conditional assignment of node_name

You can enhance readability by rearranging the conditional expression.

Apply this diff to simplify the assignment:

-                node_name = replica.hostId if not node_name else node_name
+                node_name = node_name if node_name else replica.hostId

Alternatively, you can use the more concise:

+                node_name = node_name or replica.hostId
🧰 Tools
🪛 Ruff

183-183: Use node_name if node_name else replica.hostId instead of replica.hostId if not node_name else node_name

Replace with node_name if node_name else replica.hostId

(SIM212)


237-237: Simplify conditional expression in logging statement

For improved readability, rearrange the conditional expression in the log message.

Apply this diff:

-            logging(f"wait for {volume_name} replica rebuilding completed on {'all nodes' if not node_name else node_name} ... ({i})")
+            logging(f"wait for {volume_name} replica rebuilding completed on {node_name if node_name else 'all nodes'} ... ({i})")
🧰 Tools
🪛 Ruff

237-237: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)


274-275: Consistent conditional expressions in assertions and logging

To maintain consistency and enhance readability, adjust the conditional expressions in both the logging and assertion statements.

Apply this diff:

-        logging(f"Completed volume {volume_name} replica rebuilding on {'all nodes' if not node_name else node_name}")
-        assert completed, f"Expect volume {volume_name} replica rebuilding completed on {'all nodes' if not node_name else node_name}"
+        logging(f"Completed volume {volume_name} replica rebuilding on {node_name if node_name else 'all nodes'}")
+        assert completed, f"Expect volume {volume_name} replica rebuilding completed on {node_name if node_name else 'all nodes'}"
🧰 Tools
🪛 Ruff

274-274: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)


275-275: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 78eca51 and 77fab67.

📒 Files selected for processing (6)
  • e2e/keywords/volume.resource (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (3 hunks)
  • e2e/libs/volume/crd.py (2 hunks)
  • e2e/libs/volume/rest.py (4 hunks)
  • e2e/libs/volume/volume.py (2 hunks)
  • e2e/tests/regression/test_replica.robot (1 hunks)
🧰 Additional context used
🪛 Ruff
e2e/libs/volume/rest.py

140-140: Comparison to None should be cond is not None

Replace with cond is not None

(E711)


144-144: Loop control variable i not used within loop body

(B007)


183-183: Use node_name if node_name else replica.hostId instead of replica.hostId if not node_name else node_name

Replace with node_name if node_name else replica.hostId

(SIM212)


237-237: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)


274-274: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)


275-275: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)

🔇 Additional comments (4)
e2e/libs/volume/volume.py (1)

104-106: LGTM: Well-structured addition of replica deletion by name.

The new method enhances the API by providing more granular control over replica deletion, which is beneficial for testing scenarios requiring precise replica management.

e2e/libs/keywords/volume_keywords.py (1)

162-162: LGTM! Code simplification improves maintainability.

The simplified implementation correctly delegates the wait operation to the volume class.

e2e/libs/volume/crd.py (1)

452-452: LGTM! Good flexibility improvement.

Making the node_name parameter optional improves the method's flexibility while maintaining backward compatibility.

e2e/libs/volume/rest.py (1)

149-156: Good practice: Checking for concurrent replica rebuilding

The addition of checks to ensure that only one replica is rebuilding at a time enhances data consistency and prevents potential conflicts.

Comment on lines +27 to +29
... 1. Set soft anti-affinity to `true`.
... 2. Create a volume with 1 replica.
... 3. Attach the volume and write a few hundreds MB data to it.
Copy link

@coderabbitai coderabbitai bot Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Documentation inconsistency with implementation.

The documentation states "Create a volume with 1 replica" but the implementation creates a volume with 3 replicas.

Either update the documentation or adjust the implementation to match:

-    ...    2. Create a volume with 1 replica.
+    ...    2. Create a volume with 3 replicas.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
... 1. Set soft anti-affinity to `true`.
... 2. Create a volume with 1 replica.
... 3. Attach the volume and write a few hundreds MB data to it.
... 1. Set soft anti-affinity to `true`.
... 2. Create a volume with 3 replicas.
... 3. Attach the volume and write a few hundreds MB data to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this comment for making test description and test code more consistent.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriscchien, glad to hear that you agree. Updating the documentation step to match the implementation will ensure consistency. Here's the updated step:

    ...    2. Create a volume with 3 replicas.

Let me know if you need any further assistance.

Comment on lines +39 to +40
And Write data to volume 0

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for data write operation.

The test writes data but doesn't verify the write operation's success before proceeding with replica operations.

 And Write data to volume 0
+And Verify data write succeeded on volume 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
And Write data to volume 0
And Write data to volume 0
And Verify data write succeeded on volume 0

Comment on lines +128 to 129
def wait_for_replica_rebuilding_complete(self, volume_name, node_name=None):
return self.volume.wait_for_replica_rebuilding_complete(volume_name, node_name)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Inconsistency found in node_name parameter handling

The verification reveals inconsistencies in the node_name parameter handling across implementations:

  • base.py: All methods declare node_name as required parameter
  • rest.py: All methods correctly implement node_name=None as optional
  • crd.py: Only wait_for_replica_rebuilding_complete has node_name=None, while other rebuilding methods have required node_name
  • volume.py: All methods correctly implement node_name=None as optional

The base class and CRD implementation need to be updated to maintain consistency with the optional node_name parameter pattern.

🔗 Analysis chain

Add docstring and verify implementation consistency.

Add a docstring to explain the behavior with and without node_name parameter.

     def wait_for_replica_rebuilding_complete(self, volume_name, node_name=None):
+        """Wait for replica rebuilding to complete.
+        
+        Args:
+            volume_name: Name of the volume
+            node_name: Optional. If provided, wait for rebuilding completion on specific node.
+                      If None, wait for any replica rebuilding completion.
+        """
         return self.volume.wait_for_replica_rebuilding_complete(volume_name, node_name)

Let's verify that the underlying implementations in CRD and Rest classes handle the optional node_name consistently:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if CRD and Rest implementations handle optional node_name consistently

# Test: Search for the implementation of these methods in CRD and Rest classes
echo "Checking implementations in CRD and Rest classes..."
rg -A 10 "def (wait_for_replica_rebuilding_start|is_replica_rebuilding_in_progress|wait_for_replica_rebuilding_complete)" "e2e/libs/volume/"

Length of output: 7703

Comment on lines +131 to +137
def delete_replicas(self, volume_name, count):
replica_list = self.replica.get(volume_name, node_name="")
replica_names = [replica['metadata']['name'] for replica in replica_list]
for i in range(int(count)):
logging(f"Deleting volume {volume_name} replica volume {replica_names[i]}")
self.volume.delete_replica_by_name(volume_name, replica_names[i])

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation and error handling.

The method needs additional safeguards:

  1. Validate that count is a positive integer and doesn't exceed available replicas
  2. Add error handling for the delete operation

Consider this implementation:

 def delete_replicas(self, volume_name, count):
+    try:
+        count = int(count)
+        if count <= 0:
+            raise ValueError("Count must be positive")
+
         replica_list = self.replica.get(volume_name, node_name="")
         replica_names = [replica['metadata']['name'] for replica in replica_list]
+        if count > len(replica_names):
+            raise ValueError(f"Cannot delete {count} replicas, only {len(replica_names)} available")
+
         for i in range(int(count)):
             logging(f"Deleting volume {volume_name} replica volume {replica_names[i]}")
-            self.volume.delete_replica_by_name(volume_name, replica_names[i])
+            try:
+                self.volume.delete_replica_by_name(volume_name, replica_names[i])
+            except Exception as e:
+                logging(f"Failed to delete replica {replica_names[i]}: {str(e)}")
+                raise
+    except ValueError as e:
+        logging(f"Invalid input: {str(e)}")
+        raise
+    except Exception as e:
+        logging(f"Failed to delete replicas: {str(e)}")
+        raise
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def delete_replicas(self, volume_name, count):
replica_list = self.replica.get(volume_name, node_name="")
replica_names = [replica['metadata']['name'] for replica in replica_list]
for i in range(int(count)):
logging(f"Deleting volume {volume_name} replica volume {replica_names[i]}")
self.volume.delete_replica_by_name(volume_name, replica_names[i])
def delete_replicas(self, volume_name, count):
try:
count = int(count)
if count <= 0:
raise ValueError("Count must be positive")
replica_list = self.replica.get(volume_name, node_name="")
replica_names = [replica['metadata']['name'] for replica in replica_list]
if count > len(replica_names):
raise ValueError(f"Cannot delete {count} replicas, only {len(replica_names)} available")
for i in range(int(count)):
logging(f"Deleting volume {volume_name} replica volume {replica_names[i]}")
try:
self.volume.delete_replica_by_name(volume_name, replica_names[i])
except Exception as e:
logging(f"Failed to delete replica {replica_names[i]}: {str(e)}")
raise
except ValueError as e:
logging(f"Invalid input: {str(e)}")
raise
except Exception as e:
logging(f"Failed to delete replicas: {str(e)}")
raise

Comment on lines +423 to +439
def delete_replica_by_name(self, volume_name, replica_name):
replica = self.obj_api.get_namespaced_custom_object(
group="longhorn.io",
version="v1beta2",
namespace="longhorn-system",
plural="replicas",
name=replica_name
)
logging(f"Deleting replica {replica['metadata']['name']}")
self.obj_api.delete_namespaced_custom_object(
group="longhorn.io",
version="v1beta2",
namespace="longhorn-system",
plural="replicas",
name=replica['metadata']['name']
)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling and optimize replica deletion.

The implementation could be improved in the following ways:

  1. Add error handling for non-existent replicas
  2. Remove redundant metadata access since we already have the replica name

Consider this implementation:

 def delete_replica_by_name(self, volume_name, replica_name):
-    replica = self.obj_api.get_namespaced_custom_object(
-        group="longhorn.io",
-        version="v1beta2",
-        namespace="longhorn-system",
-        plural="replicas",
-        name=replica_name
-    )
-    logging(f"Deleting replica {replica['metadata']['name']}")
-    self.obj_api.delete_namespaced_custom_object(
-        group="longhorn.io",
-        version="v1beta2",
-        namespace="longhorn-system",
-        plural="replicas",
-        name=replica['metadata']['name']
-    )
+    try:
+        logging(f"Deleting replica {replica_name}")
+        self.obj_api.delete_namespaced_custom_object(
+            group="longhorn.io",
+            version="v1beta2",
+            namespace="longhorn-system",
+            plural="replicas",
+            name=replica_name
+        )
+    except ApiException as e:
+        if e.status == 404:
+            logging(f"Replica {replica_name} not found")
+        else:
+            raise e
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def delete_replica_by_name(self, volume_name, replica_name):
replica = self.obj_api.get_namespaced_custom_object(
group="longhorn.io",
version="v1beta2",
namespace="longhorn-system",
plural="replicas",
name=replica_name
)
logging(f"Deleting replica {replica['metadata']['name']}")
self.obj_api.delete_namespaced_custom_object(
group="longhorn.io",
version="v1beta2",
namespace="longhorn-system",
plural="replicas",
name=replica['metadata']['name']
)
def delete_replica_by_name(self, volume_name, replica_name):
try:
logging(f"Deleting replica {replica_name}")
self.obj_api.delete_namespaced_custom_object(
group="longhorn.io",
version="v1beta2",
namespace="longhorn-system",
plural="replicas",
name=replica_name
)
except ApiException as e:
if e.status == 404:
logging(f"Replica {replica_name} not found")
else:
raise e


When Update volume 0 replica count to 5
Then Only one replica rebuilding will start at a time for volume 0
And Wait until volume 0 replicas rebuilding completed
Copy link
Contributor

@chriscchien chriscchien Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to change the keyword name Wait until volume 0 replicas rebuilding is completed to emphasize that it ensures only one replica in the rebuild (WO) state until the rebuild is completed, for better match test case name Test Replica Rebuilding Per Volume Limit? Beside this, no further questions. Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriscchien Could you suggest an appropriate name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about Only one replica for volume 0 rebuilds at a time until the rebuild is completed for below two?

 Then Only one replica rebuilding will start at a time for volume 0
 And Wait until volume 0 replicas rebuilding completed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriscchien How about adding another step Monitor only one replica rebuilding will start at a time for volume 0:

Then Only one replica rebuilding will start at a time for volume 0
And Monitor only one replica rebuilding will start at a time for volume 0
And Wait until volume 0 replicas rebuilding completed

This aligns with the description in the test skeleton.

@yangchiu yangchiu requested review from chriscchien and a team November 11, 2024 00:56
@yangchiu yangchiu force-pushed the migrate-replica-rebuild-per-volume-limit branch from 77fab67 to cf7e52d Compare November 11, 2024 06:10
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (7)
e2e/tests/regression/test_replica.robot (2)

16-21: Add documentation for test variables.

Consider adding comments to explain the significance of these values, particularly:

  • Why LOOP_COUNT is set to 1
  • The reasoning behind 300 retries with 1-second intervals
  • What DATA_ENGINE=v1 represents
 *** Variables ***
+# Number of test iterations
 ${LOOP_COUNT}    1
+# Maximum attempts to wait for volume operations (5 minutes total)
 ${RETRY_COUNT}    300
+# Interval between retry attempts in seconds
 ${RETRY_INTERVAL}    1
+# Storage engine version to use
 ${DATA_ENGINE}    v1

23-34: Documentation needs additional clarity.

While the test steps are well-documented, consider adding:

  1. Expected outcomes for each step
  2. Explanation of why 5 replicas are chosen in step 4
  3. Clarification on the data size mentioned in step 3 ("few hundreds MB")
e2e/keywords/volume.resource (2)

102-104: Add documentation for the new keyword.

The implementation looks good, but adding documentation would improve maintainability.

Add documentation to describe the keyword's purpose and parameters:

+[Documentation]    Deletes the specified number of replicas from a volume
+...               ${count} - Number of replicas to delete
+...               ${volume_id} - ID of the volume
 Delete ${count} replicas of volume ${volume_id}
     ${volume_name} =    generate_name_with_suffix    volume    ${volume_id}
     delete_replicas    ${volume_name}    ${count}

203-206: Add documentation for the replica rebuilding constraint keyword.

The implementation correctly validates the "one at a time" rebuilding constraint.

Add documentation to clarify the keyword's purpose:

+[Documentation]    Validates that only one replica rebuilding process occurs at a time for the specified volume
+...               ${volume_id} - ID of the volume to monitor
 Only one replica rebuilding will start at a time for volume ${volume_id}
     ${volume_name} =    generate_name_with_suffix    volume    ${volume_id}
     only_one_replica_rebuilding_will_start_at_a_time    ${volume_name}
e2e/libs/volume/rest.py (3)

140-140: Fix comparison with None.

Use is not None instead of != None for Python identity comparison.

-        assert rebuilding_replica_name != None, f"Waiting for replica rebuilding start for volume {volume_name} on node {node_name} failed: replicas = {v.replicas}"
+        assert rebuilding_replica_name is not None, f"Waiting for replica rebuilding start for volume {volume_name} on node {node_name} failed: replicas = {v.replicas}"
🧰 Tools
🪛 Ruff

140-140: Comparison to None should be cond is not None

Replace with cond is not None

(E711)


183-183: Simplify node_name assignment.

The current expression can be simplified for better readability.

-            node_name = replica.hostId if not node_name else node_name
+            node_name = node_name if node_name else replica.hostId
🧰 Tools
🪛 Ruff

183-183: Use node_name if node_name else replica.hostId instead of replica.hostId if not node_name else node_name

Replace with node_name if node_name else replica.hostId

(SIM212)


237-237: Simplify string interpolation in log messages.

Multiple instances of string interpolation can be simplified for better readability.

-            logging(f"wait for {volume_name} replica rebuilding completed on {'all nodes' if not node_name else node_name} ... ({i})")
+            logging(f"wait for {volume_name} replica rebuilding completed on {node_name if node_name else 'all nodes'} ... ({i})")

-        logging(f"Completed volume {volume_name} replica rebuilding on {'all nodes' if not node_name else node_name}")
+        logging(f"Completed volume {volume_name} replica rebuilding on {node_name if node_name else 'all nodes'}")

-        assert completed, f"Expect volume {volume_name} replica rebuilding completed on {'all nodes' if not node_name else node_name}"
+        assert completed, f"Expect volume {volume_name} replica rebuilding completed on {node_name if node_name else 'all nodes'}"

Also applies to: 274-274, 275-275

🧰 Tools
🪛 Ruff

237-237: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 77fab67 and cf7e52d.

📒 Files selected for processing (6)
  • e2e/keywords/volume.resource (3 hunks)
  • e2e/libs/keywords/volume_keywords.py (3 hunks)
  • e2e/libs/volume/crd.py (2 hunks)
  • e2e/libs/volume/rest.py (4 hunks)
  • e2e/libs/volume/volume.py (2 hunks)
  • e2e/tests/regression/test_replica.robot (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • e2e/libs/keywords/volume_keywords.py
  • e2e/libs/volume/crd.py
  • e2e/libs/volume/volume.py
🧰 Additional context used
🪛 Ruff
e2e/libs/volume/rest.py

140-140: Comparison to None should be cond is not None

Replace with cond is not None

(E711)


144-144: Loop control variable i not used within loop body

(B007)


183-183: Use node_name if node_name else replica.hostId instead of replica.hostId if not node_name else node_name

Replace with node_name if node_name else replica.hostId

(SIM212)


237-237: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)


274-274: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)


275-275: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)

🔇 Additional comments (5)
e2e/tests/regression/test_replica.robot (3)

1-15: LGTM: Well-structured test setup with comprehensive resources.

The test framework setup includes all necessary resources and proper test lifecycle hooks.


35-51: Enhance test robustness with additional validations.

  1. Consider adding volume state validation after each major operation:
 When Update volume 0 replica count to 5
+And Verify volume 0 is attached
 Then Only one replica rebuilding will start at a time for volume 0
  1. As suggested in the previous review, consider renaming the keywords to better reflect their purpose:
-Then Only one replica rebuilding will start at a time for volume 0
-And Wait until volume 0 replicas rebuilding completed
+Then Only one replica for volume 0 rebuilds at a time until the rebuild is completed
+And Wait for volume 0 replica rebuild completion

1-51: Verify complete migration from pytest.

Let's ensure all functionality from the original pytest test has been properly migrated.

✅ Verification successful

Migration verification completed - test functionality matches

The Robot Framework test implementation successfully covers all the key functionality from the original pytest version:

  1. Both tests verify the same core scenario of one-replica-at-a-time rebuilding:

    • Setting soft anti-affinity
    • Creating volume with initial replicas
    • Writing test data
    • Scaling to 5 replicas
    • Monitoring rebuild states
    • Deleting 4 replicas
    • Re-verifying rebuild behavior
    • Data integrity check
  2. The Robot test actually improves the verification by adding explicit steps to:

    • Monitor the single-replica rebuilding constraint
    • Wait for rebuild completion
    • Verify volume health

The migration has maintained full test coverage while making the test steps more explicit and readable in Robot Framework's format.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Compare pytest and robot test implementations
# to ensure complete migration

# Find the original pytest file
echo "Looking for original pytest file..."
rg -l "test_replica_rebuild_per_volume_limit" --type py

# Search for any remaining pytest references to this test
echo "Checking for remaining pytest references..."
rg "test_replica_rebuild_per_volume_limit" --type py -C 5

Length of output: 1172


Script:

#!/bin/bash
# Get the full implementation of the pytest test for comparison
echo "Getting pytest implementation..."
rg -A 50 "def test_replica_rebuild_per_volume_limit" manager/integration/tests/test_scheduling.py

# Check for any helper functions used in the pytest implementation
echo "Checking for helper functions..."
rg "def (check_volume_data|cleanup_volume|wait_for_rebuild_complete)" manager/integration/tests/test_scheduling.py -A 5

Length of output: 2396

e2e/libs/volume/rest.py (2)

Line range hint 121-171: LGTM! Good addition of concurrent rebuilding check.

The changes to make node_name optional and the addition of monitoring for concurrent rebuilding (lines 149-155) improve the robustness of the test framework.

🧰 Tools
🪛 Ruff

140-140: Comparison to None should be cond is not None

Replace with cond is not None

(E711)


144-144: Loop control variable i not used within loop body

(B007)


234-270: LGTM! Well-documented rebuild status handling.

The changes effectively handle both node-specific and all-node rebuilding scenarios. The comments about rebuildStatus reliability (lines 251-258) provide valuable context for future maintainers.

🧰 Tools
🪛 Ruff

237-237: Use node_name if node_name else 'all nodes' instead of 'all nodes' if not node_name else node_name

Replace with node_name if node_name else 'all nodes'

(SIM212)

Comment on lines +185 to +188
Monitor only one replica rebuilding will start at a time for volume ${volume_id}
${volume_name} = generate_name_with_suffix volume ${volume_id}
wait_for_replica_rebuilding_to_complete ${volume_name}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Keyword implementation may not match its intended purpose.

The keyword name suggests monitoring that only one replica rebuilds at a time, but the implementation only waits for rebuilding completion. This might not effectively validate the "one at a time" constraint.

Consider implementing proper monitoring logic:

 Monitor only one replica rebuilding will start at a time for volume ${volume_id}
     ${volume_name} =    generate_name_with_suffix    volume    ${volume_id}
-    wait_for_replica_rebuilding_to_complete   ${volume_name}
+    only_one_replica_rebuilding_will_start_at_a_time    ${volume_name}

Committable suggestion skipped: line range outside the PR's diff.

@yangchiu yangchiu requested review from chriscchien and removed request for chriscchien November 11, 2024 06:33
Copy link
Contributor

@chriscchien chriscchien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants