-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(robot): migrate test_soft_anti_affinity_scheduling #2158
test(robot): migrate test_soft_anti_affinity_scheduling #2158
Conversation
Signed-off-by: Yang Chiu <[email protected]>
WalkthroughThe pull request introduces several enhancements to the Kubernetes-related keywords and functionality. A new keyword, Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (4)
e2e/tests/regression/test_scheduling.robot (2)
17-22
: Consider adding documentation for the variables.While the variables are well-defined, adding documentation to explain their significance and chosen values would improve maintainability. For example, document why 300 retries with 1s interval was chosen for the retry parameters.
*** Variables *** +# Maximum number of test iterations ${LOOP_COUNT} 1 +# Maximum number of retries for async operations (5 minutes total with 1s interval) ${RETRY_COUNT} 300 +# Interval between retries in seconds ${RETRY_INTERVAL} 1 +# Data engine version to use for the volume ${DATA_ENGINE} v1
40-53
: Consider adding additional verifications for completeness.While the test covers the core functionality, consider adding these verifications:
- Verify node schedulability status after cordoning
- Validate the replica count matches expectations
And Cordon node 1 + And Verify node 1 is unschedulable And Delete volume 0 replica on node 1 Then Wait until volume 0 replicas rebuilding completed And Wait for volume 0 healthy + And Verify volume 0 has 3 replicas And Check volume 0 data is intacte2e/keywords/k8s.resource (1)
72-75
: Add documentation for the new keyword.Consider adding documentation to describe the purpose and parameters of this keyword, following the Robot Framework documentation style.
+[Documentation] Cordons a Kubernetes node specified by its index (0-based). +... The keyword will fail if the node cannot be cordoned or +... if the node remains schedulable after cordoning. +[Arguments] ${node_id} Cordon node ${node_id} ${node_name} = get_node_by_index ${node_id} cordon_node ${node_name} check_node_is_not_schedulable ${node_name}e2e/libs/keywords/volume_keywords.py (1)
Line range hint
191-211
: Add timeout and error handling to the async method.While the implementation is correct, consider adding timeout and error handling for robustness:
async def wait_for_both_replica_rebuildings(): tasks = [ asyncio.create_task(self.volume.wait_for_replica_rebuilding_start(volume_name_0, node_id), name=volume_name_0), asyncio.create_task(self.volume.wait_for_replica_rebuilding_start(volume_name_1, node_id), name=volume_name_1) ] - done, pending = await asyncio.wait(tasks, return_when=asyncio.ALL_COMPLETED) - logging(f"Observed {done.pop().get_name()} and {done.pop().get_name()} started replica rebuilding first") + try: + done, pending = await asyncio.wait( + tasks, + timeout=300, # 5 minutes timeout + return_when=asyncio.ALL_COMPLETED + ) + if pending: + for task in pending: + task.cancel() + raise TimeoutError(f"Timeout waiting for replica rebuilding to start") + logging(f"Observed {done.pop().get_name()} and {done.pop().get_name()} started replica rebuilding") + except Exception as e: + for task in tasks: + task.cancel() + raise RuntimeError(f"Failed to wait for replica rebuilding: {str(e)}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
e2e/keywords/k8s.resource
(1 hunks)e2e/libs/keywords/volume_keywords.py
(1 hunks)e2e/libs/node/node.py
(1 hunks)e2e/tests/regression/test_scheduling.robot
(1 hunks)
🔇 Additional comments (6)
e2e/tests/regression/test_scheduling.robot (2)
1-16
: LGTM! Well-structured test setup with proper resource imports.
The settings section follows Robot Framework best practices with clear documentation, appropriate tags, and comprehensive resource imports. The test setup and teardown ensure proper test isolation.
24-39
: LGTM! Well-documented test case with clear BDD structure.
The test case documentation clearly explains the purpose and steps, following BDD best practices. The comments about node scheduling vs. cordoning provide valuable context for future maintainers.
e2e/keywords/k8s.resource (2)
72-75
: Verify consistency with existing cordon implementations.
The implementation looks correct and follows the established pattern. Let's verify it's being used consistently across test cases.
#!/bin/bash
# Check for other cordon implementations and usage patterns
rg -A 5 "cordon_node|check_node_is_not_schedulable" --type robot
72-75
: Consider adding error handling for invalid node indices.
The keyword might fail with unclear errors if an invalid node_id is provided. Consider adding validation or proper error handling.
Let's verify the error handling in the underlying implementation:
e2e/libs/node/node.py (1)
172-175
: Consider consolidating node schedulability methods.
The new is_node_schedulable
method appears to be a non-assertive version of the existing check_node_schedulable
method. This is good for separation of concerns, as it allows other components to check the status without forcing assertions.
Let's verify the usage of both methods to ensure they serve different purposes:
e2e/libs/keywords/volume_keywords.py (1)
Line range hint 163-189
: LGTM! Well-implemented async method.
The implementation follows asyncio best practices with proper task management, cancellation, and clear naming.
def is_node_schedulable(self, node_name): | ||
node = get_longhorn_client().by_id_node(node_name) | ||
return node["conditions"]["Schedulable"]["status"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling, type hints, and documentation.
While the implementation is functionally correct, consider these improvements for better maintainability and reliability:
- def is_node_schedulable(self, node_name):
- node = get_longhorn_client().by_id_node(node_name)
- return node["conditions"]["Schedulable"]["status"]
+ def is_node_schedulable(self, node_name: str) -> bool:
+ """Check if a node is schedulable without asserting the status.
+
+ Args:
+ node_name: The name of the node to check.
+
+ Returns:
+ bool: True if the node is schedulable, False otherwise.
+
+ Raises:
+ KeyError: If the node's conditions are not properly structured.
+ """
+ node = get_longhorn_client().by_id_node(node_name)
+ try:
+ return node["conditions"]["Schedulable"]["status"]
+ except KeyError as e:
+ raise KeyError(f"Failed to get schedulable status for node {node_name}: {e}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def is_node_schedulable(self, node_name): | |
node = get_longhorn_client().by_id_node(node_name) | |
return node["conditions"]["Schedulable"]["status"] | |
def is_node_schedulable(self, node_name: str) -> bool: | |
"""Check if a node is schedulable without asserting the status. | |
Args: | |
node_name: The name of the node to check. | |
Returns: | |
bool: True if the node is schedulable, False otherwise. | |
Raises: | |
KeyError: If the node's conditions are not properly structured. | |
""" | |
node = get_longhorn_client().by_id_node(node_name) | |
try: | |
return node["conditions"]["Schedulable"]["status"] | |
except KeyError as e: | |
raise KeyError(f"Failed to get schedulable status for node {node_name}: {e}") |
if self.node.is_node_schedulable(node_name) == "False": | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve the node schedulability check condition.
The current string comparison with "False" is fragile. Consider these improvements:
- if self.node.is_node_schedulable(node_name) == "False":
- continue
+ if not self.node.is_node_schedulable(node_name):
+ logging(f"Skipping node {node_name} as it is not schedulable")
+ continue
Also, consider adding a unit test to verify this behavior.
Would you like me to help create a unit test for this scenario?
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if self.node.is_node_schedulable(node_name) == "False": | |
continue | |
if not self.node.is_node_schedulable(node_name): | |
logging(f"Skipping node {node_name} as it is not schedulable") | |
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Which issue(s) this PR fixes:
Issue longhorn/longhorn#9775
What this PR does / why we need it:
migrate test_soft_anti_affinity_scheduling
Special notes for your reviewer:
Additional documentation or context
Summary by CodeRabbit
Release Notes
New Features
Improvements
Testing