-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(robot): v2 volume should cleanup resources when instance manager is deleted #2199
test(robot): v2 volume should cleanup resources when instance manager is deleted #2199
Conversation
WalkthroughThe pull request introduces enhancements to Longhorn's end-to-end testing framework, focusing on node management, device handling, and test coverage. New keywords and methods have been added to facilitate better testing of volume and node operations, particularly around device management and instance manager interactions. The changes include methods for listing devices, retrieving instance managers, and adding new test scenarios to validate resource cleanup and node management processes. Changes
Assessment against linked issues
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (11)
🚧 Files skipped from review as they are similar to previous changes (8)
👮 Files not reviewed due to content moderation or server errors (3)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
35896a0
to
954f69a
Compare
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
e2e/libs/keywords/workload_keywords.py (1)
49-49
: Consider standardizing logging format across similar methods.For consistency, consider adding namespace information to logs in related methods like
create_pod()
. This would maintain a uniform logging pattern throughout the class.Example for
create_pod()
:def create_pod(self, pod_name, claim_name): - logging(f'Creating pod {pod_name} using pvc {claim_name}') + logging(f'Creating pod {pod_name} using pvc {claim_name} in namespace {namespace}') create_pod(new_busybox_manifest(pod_name, claim_name))e2e/tests/regression/test_v2.robot (2)
82-93
: Consider adding cleanup verification after test completion.While the test properly sets up the volumes and writes data, it would be beneficial to verify that all resources are properly cleaned up after the test, even in failure scenarios.
Consider adding these verifications in the test teardown:
- Verify no orphaned devices remain
- Ensure instance managers are properly terminated
- Check for any leaked resources
95-103
: Consider adding error handling for node operations.The test should handle potential failures in node cordoning and instance manager deletion gracefully.
Consider:
- Adding timeout and retry logic for node operations
- Verifying node state before proceeding with device checks
- Adding cleanup steps if the test fails at this stage
e2e/libs/node/node.py (2)
291-294
: Consider adding error handling for dmsetup command.While the implementation is clean and focused, it could benefit from error handling in case the
dmsetup
command is not available on the node.def list_dm_devices(self, node_name): - cmd = "dmsetup ls | awk '{print $1}'" + cmd = "if command -v dmsetup >/dev/null 2>&1; then dmsetup ls | awk '{print $1}'; else echo 'dmsetup command not found' >&2; exit 1; fi" res = NodeExec(node_name).issue_cmd(cmd) return res
296-299
: Consider adding error handling for missing directory.While the implementation is clean and focused, it could benefit from error handling in case the Longhorn device directory doesn't exist.
def list_volume_devices(self, node_name): - cmd = "ls /dev/longhorn/" + cmd = "if [ -d /dev/longhorn/ ]; then ls /dev/longhorn/; else echo 'Longhorn device directory not found' >&2; exit 1; fi" res = NodeExec(node_name).issue_cmd(cmd) return rese2e/keywords/volume.resource (1)
392-402
: LGTM: Well-structured volume device assertion keyword.The keyword follows the same pattern as the DM device assertion:
- Consistent implementation style
- Clear condition handling
- Proper error handling
Consider adding documentation to clarify the expected format of device names and the difference between DM devices and volume devices.
e2e/tests/negative/node_drain.robot (1)
Line range hint
1-220
: Well-structured test suite with comprehensive coverage.The test suite demonstrates excellent test design practices:
- Clear documentation for each test case with detailed steps
- Proper test setup and teardown
- Comprehensive coverage of node drain scenarios
- Good use of Robot Framework's Given-When-Then style
- Proper validation of data integrity after operations
Consider adding the following test scenarios if not covered elsewhere:
- Node drain during volume expansion
- Node drain with multiple concurrent workloads
- Node drain during backup/restore operations
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
e2e/keywords/k8s.resource
(2 hunks)e2e/keywords/longhorn.resource
(2 hunks)e2e/keywords/volume.resource
(1 hunks)e2e/libs/backing_image/crd.py
(6 hunks)e2e/libs/keywords/node_keywords.py
(1 hunks)e2e/libs/keywords/volume_keywords.py
(3 hunks)e2e/libs/keywords/workload_keywords.py
(1 hunks)e2e/libs/node/node.py
(1 hunks)e2e/libs/node_exec/node_exec.py
(2 hunks)e2e/tests/negative/node_drain.robot
(1 hunks)e2e/tests/regression/test_v2.robot
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- e2e/libs/backing_image/crd.py
🔇 Additional comments (14)
e2e/libs/keywords/workload_keywords.py (1)
49-49
: LGTM! Enhanced logging with namespace context.
The addition of namespace information to the log message improves observability and debugging capabilities, especially useful for tracking pod deletions across different namespaces.
e2e/libs/keywords/node_keywords.py (1)
79-83
: LGTM! Clean implementation of device listing methods.
The new methods follow the established pattern in the class, providing clear interfaces to the underlying node functionality.
e2e/keywords/longhorn.resource (1)
72-73
: LGTM! Good refactoring of instance manager deletion.
The changes simplify the logic by directly retrieving the instance manager, making the code more maintainable and reducing duplication.
Also applies to: 78-79
e2e/tests/regression/test_v2.robot (2)
75-81
: LGTM! Well-structured test case with clear documentation.
The test case effectively validates the resource cleanup behavior with good coverage of both v1 and v2 volumes.
104-116
: Verify volume health check timeouts.
The test waits for volumes to become healthy but doesn't specify timeouts. This could lead to test hangs in failure scenarios.
Let's check the default timeout values:
✅ Verification successful
Let me search for the timeout configurations in the robot framework files.
Let me search for the implementation of the volume health check wait function.
Let me check the implementation of the volume status wait function to find the timeout configuration.
Let me check the values of the timeout constants used in the wait function.
Based on the findings, I can now generate the final response:
The volume health check has appropriate timeout configurations.
The wait_for_volume_healthy function uses the following timeout settings:
- Default retry count: 150 attempts (RETRY_COUNTS)
- Long retry count: 360 attempts (RETRY_COUNTS_LONG)
- Retry interval: 1 second (RETRY_INTERVAL)
This means the health check will wait up to:
- 150 seconds (2.5 minutes) by default
- 360 seconds (6 minutes) for long operations
These timeouts are sufficient to handle normal volume operations while preventing indefinite hangs. The test framework also uses these same constants consistently across different test scenarios.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for volume health check timeout configurations
rg -A 2 "wait.*health" --glob '*.{py,robot,resource}'
Length of output: 89535
Script:
#!/bin/bash
# Search for volume health check timeout configurations in robot files
rg -A 2 "Wait for volume.*healthy" --glob '*.robot'
Length of output: 25981
Script:
#!/bin/bash
# Search for the implementation of volume health check wait function
rg -A 5 "def wait_for_volume.*healthy" --glob '*.{py,robot}'
Length of output: 2125
Script:
#!/bin/bash
# Search for the implementation of wait_for_volume_status function
rg -A 10 "def wait_for_volume_status" --glob '*.{py,robot}'
Length of output: 905
Script:
#!/bin/bash
# Search for the definition of RETRY constants
rg "RETRY_COUNTS|RETRY_INTERVAL|RETRY_COUNTS_LONG" --glob '*.py'
Length of output: 40264
e2e/libs/node_exec/node_exec.py (2)
54-54
: LGTM! Improved log readability.
The addition of a newline character enhances log readability by clearly separating the command result from the log message.
103-108
: LGTM! Essential toleration for cordoned nodes.
The added toleration is crucial as it enables command execution on cordoned nodes, which is necessary for node management operations. The configuration follows Kubernetes best practices.
e2e/keywords/k8s.resource (2)
75-77
: LGTM! Well-structured node management keyword.
The new keyword follows consistent patterns with other node management keywords and provides essential functionality for uncordoning nodes by their index.
124-125
: LGTM! Fixed argument name typo.
Corrected the argument name from instance_manger
to instance_manager
for better consistency.
e2e/libs/keywords/volume_keywords.py (3)
17-17
: LGTM: Clean import addition.
Added import for VolumeRest to support the new instance manager retrieval functionality.
74-78
: LGTM: Well-implemented instance manager retrieval.
The new method get_volume_instance_manager
is well-implemented with proper validation:
- Retrieves volume details using VolumeRest
- Validates single controller existence
- Returns the instance manager name
133-133
: LGTM: Clean formatting.
Removed unnecessary line break, improving code readability.
e2e/keywords/volume.resource (1)
380-390
: LGTM: Well-structured DM device assertion keyword.
The keyword implementation is clean and includes:
- Proper variable resolution
- Clear condition handling
- Appropriate error handling for invalid conditions
e2e/tests/negative/node_drain.robot (1)
220-220
: Fix typo in variable name.
Good catch on fixing the typo from instance_manger
to instance_manager
. This maintains consistency with the variable naming across the codebase.
954f69a
to
4ba5b19
Compare
4ba5b19
to
75b888b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
e2e/tests/regression/test_v2.robot (1)
99-118
: LGTM: Comprehensive test assertions with proper validation flowThe test case thoroughly validates the cleanup process by:
- Verifying device cleanup for affected volumes
- Ensuring v1 volumes remain unaffected
- Confirming data integrity after recovery
- Using appropriate assertions for both positive and negative cases
The loop structure ensures reliability by testing multiple iterations.
Consider adding error handling for node operations to make the test more robust in case of infrastructure issues:
- Timeout handling for cordon/uncordon operations
- Retry mechanism for health checks
- Cleanup steps in case of intermediate failures
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
e2e/keywords/k8s.resource
(2 hunks)e2e/keywords/volume.resource
(1 hunks)e2e/libs/keywords/node_keywords.py
(1 hunks)e2e/libs/node/node.py
(1 hunks)e2e/libs/node_exec/node_exec.py
(2 hunks)e2e/tests/regression/test_v2.robot
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- e2e/libs/node_exec/node_exec.py
- e2e/libs/node/node.py
- e2e/libs/keywords/node_keywords.py
- e2e/keywords/k8s.resource
- e2e/keywords/volume.resource
🔇 Additional comments (4)
e2e/tests/regression/test_v2.robot (4)
17-17
: LGTM: Resource import is properly placed
The addition of k8s.resource
is well-organized with other resource imports and provides necessary node management keywords for the new test case.
75-81
: LGTM: Well-structured test case with clear documentation
The test case is properly structured with:
- Clear, descriptive name
- Appropriate test tag
- Comprehensive documentation explaining the purpose and linking to the related issue
82-94
: LGTM: Comprehensive test setup with proper volume initialization
The setup creates a robust test environment by:
- Creating both v2 and v1 volumes for comparison
- Properly waiting for volume health
- Including data writing for validation
95-98
: Verify instance manager deletion success
While the test flow is logical, it would be beneficial to verify that the instance manager deletion was successful before proceeding with the assertions.
Consider adding a verification step after line 97:
When Cordon node 0
And Delete instance-manager of volume 0
+And Wait for instance-manager deletion complete
Signed-off-by: Chin-Ya Huang <[email protected]>
Signed-off-by: Chin-Ya Huang <[email protected]>
Signed-off-by: Chin-Ya Huang <[email protected]>
Signed-off-by: Chin-Ya Huang <[email protected]>
… is deleted longhorn/longhorn-9959 longhorn/longhorn-9989 Signed-off-by: Chin-Ya Huang <[email protected]>
75b888b
to
c7d5648
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I have verified longhorn/longhorn#9959 without any problem.
Which issue(s) this PR fixes:
Issue longhorn/longhorn#9959, longhorn/longhorn#9989
What this PR does / why we need it:
Introduce a new robot test case.
Special notes for your reviewer:
None
Additional documentation or context
None
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Style