Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot): migrate test_soft_anti_affinity_scheduling #2158

Merged
merged 1 commit into from
Nov 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions e2e/keywords/k8s.resource
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,11 @@ Drain volume of ${workload_kind} ${workload_id} volume node
Uncordon the drained node
uncordon_node ${drained_node}

Cordon node ${node_id}
${node_name} = get_node_by_index ${node_id}
cordon_node ${node_name}
check_node_is_not_schedulable ${node_name}

Cordon ${workload_kind} ${workload_id} volume node
${workload_name} = generate_name_with_suffix ${workload_kind} ${workload_id}
${volume_name} = get_workload_volume_name ${workload_name}
Expand Down
2 changes: 2 additions & 0 deletions e2e/libs/keywords/volume_keywords.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,8 @@ def wait_for_replica_rebuilding_to_complete_on_node(self, volume_name, replica_l

def wait_for_replica_rebuilding_to_complete(self, volume_name):
for node_name in self.node.list_node_names_by_role("worker"):
if self.node.is_node_schedulable(node_name) == "False":
continue
Comment on lines +156 to +157
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve the node schedulability check condition.

The current string comparison with "False" is fragile. Consider these improvements:

-            if self.node.is_node_schedulable(node_name) == "False":
-                continue
+            if not self.node.is_node_schedulable(node_name):
+                logging(f"Skipping node {node_name} as it is not schedulable")
+                continue

Also, consider adding a unit test to verify this behavior.

Would you like me to help create a unit test for this scenario?

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.node.is_node_schedulable(node_name) == "False":
continue
if not self.node.is_node_schedulable(node_name):
logging(f"Skipping node {node_name} as it is not schedulable")
continue

logging(f"Waiting for volume {volume_name}'s replica on node {node_name} rebuilding completed")
self.volume.wait_for_replica_rebuilding_complete(volume_name, node_name)

Expand Down
4 changes: 4 additions & 0 deletions e2e/libs/node/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,3 +169,7 @@ def check_node_schedulable(self, node_name, schedulable):
break
time.sleep(self.retry_interval)
assert node["conditions"]["Schedulable"]["status"] == schedulable

def is_node_schedulable(self, node_name):
node = get_longhorn_client().by_id_node(node_name)
return node["conditions"]["Schedulable"]["status"]
Comment on lines +173 to +175
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling, type hints, and documentation.

While the implementation is functionally correct, consider these improvements for better maintainability and reliability:

-    def is_node_schedulable(self, node_name):
-        node = get_longhorn_client().by_id_node(node_name)
-        return node["conditions"]["Schedulable"]["status"]
+    def is_node_schedulable(self, node_name: str) -> bool:
+        """Check if a node is schedulable without asserting the status.
+
+        Args:
+            node_name: The name of the node to check.
+
+        Returns:
+            bool: True if the node is schedulable, False otherwise.
+
+        Raises:
+            KeyError: If the node's conditions are not properly structured.
+        """
+        node = get_longhorn_client().by_id_node(node_name)
+        try:
+            return node["conditions"]["Schedulable"]["status"]
+        except KeyError as e:
+            raise KeyError(f"Failed to get schedulable status for node {node_name}: {e}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def is_node_schedulable(self, node_name):
node = get_longhorn_client().by_id_node(node_name)
return node["conditions"]["Schedulable"]["status"]
def is_node_schedulable(self, node_name: str) -> bool:
"""Check if a node is schedulable without asserting the status.
Args:
node_name: The name of the node to check.
Returns:
bool: True if the node is schedulable, False otherwise.
Raises:
KeyError: If the node's conditions are not properly structured.
"""
node = get_longhorn_client().by_id_node(node_name)
try:
return node["conditions"]["Schedulable"]["status"]
except KeyError as e:
raise KeyError(f"Failed to get schedulable status for node {node_name}: {e}")

53 changes: 53 additions & 0 deletions e2e/tests/regression/test_scheduling.robot
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
*** Settings ***
Documentation Scheduling Test Cases

Test Tags regression

Resource ../keywords/common.resource
Resource ../keywords/volume.resource
Resource ../keywords/setting.resource
Resource ../keywords/deployment.resource
Resource ../keywords/persistentvolumeclaim.resource
Resource ../keywords/workload.resource
Resource ../keywords/k8s.resource

Test Setup Set test environment
Test Teardown Cleanup test resources

*** Variables ***
${LOOP_COUNT} 1
${RETRY_COUNT} 300
${RETRY_INTERVAL} 1
${DATA_ENGINE} v1

*** Test Cases ***
Test Soft Anti Affinity Scheduling
[Tags] coretest
[Documentation] Test that volumes with Soft Anti-Affinity work as expected.
...
... With Soft Anti-Affinity, a new replica should still be scheduled on a node
... with an existing replica, which will result in "Healthy" state but limited
... redundancy.
...
... 1. Create a volume and attach to the current node
... 2. Generate and write `data` to the volume.
... 3. Set `soft anti-affinity` to true
... 4. Disable current node's scheduling.
... 5. Remove the replica on the current node
... 6. Wait for the volume to complete rebuild. Volume should have 3 replicas.
... 7. Verify `data`
Given Create volume 0 with numberOfReplicas=3 dataEngine=${DATA_ENGINE}
And Attach volume 0
And Wait for volume 0 healthy
And Write data to volume 0

When Set setting replica-soft-anti-affinity to true
# disabling scheduling on a node only sets the node status to "Disable", not "Unschedulable"
# therefore disabling scheduling doesn't alter the node["conditions"]["Schedulable"]["status"] field
# only cordoning a node can set it to "Unschedulable"
And Cordon node 1
And Delete volume 0 replica on node 1

Then Wait until volume 0 replicas rebuilding completed
And Wait for volume 0 healthy
And Check volume 0 data is intact