-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(robot): add node down during migration test cases #2208
Open
yangchiu
wants to merge
1
commit into
longhorn:master
Choose a base branch
from
yangchiu:migration-rollback-after-migration-node-down
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,3 +37,70 @@ Migration Confirmation After Migration Node Down | |
Then Wait for volume 0 to migrate to node 1 | ||
And Wait for volume 0 healthy | ||
And Check volume 0 data is intact | ||
|
||
Migration Rollback After Migration Node Down | ||
Given Create volume 0 with migratable=True accessMode=RWX dataEngine=${DATA_ENGINE} | ||
And Attach volume 0 to node 0 | ||
And Wait for volume 0 healthy | ||
And Write data to volume 0 | ||
|
||
And Attach volume 0 to node 1 | ||
And Wait for volume 0 migration to be ready | ||
|
||
# power off migration node | ||
When Power off node 1 | ||
# migration rollback by detaching from the migration node | ||
And Detach volume 0 from node 1 | ||
|
||
# migration rollback succeed | ||
Then Wait for volume 0 to stay on node 0 | ||
And Wait for volume 0 degraded | ||
And Check volume 0 data is intact | ||
|
||
Migration Confirmation After Original Node Down | ||
Given Create volume 0 with migratable=True accessMode=RWX dataEngine=${DATA_ENGINE} | ||
And Attach volume 0 to node 0 | ||
And Wait for volume 0 healthy | ||
And Write data to volume 0 | ||
|
||
And Attach volume 0 to node 1 | ||
And Wait for volume 0 migration to be ready | ||
|
||
# power off original node | ||
When Power off node 0 | ||
# migration confirmation by detaching from the original node | ||
And Detach volume 0 from node 0 | ||
|
||
# migration is stuck until the Kubernetes pod eviction controller decides to | ||
# terminate the instance-manager pod that was running on the original node. | ||
# then Longhorn detaches the volume and cleanly reattaches it to the migration node. | ||
Then Wait for volume 0 to migrate to node 1 | ||
And Wait for volume 0 degraded | ||
And Check volume 0 data is intact | ||
|
||
Migration Rollback After Original Node Down | ||
Given Create volume 0 with migratable=True accessMode=RWX dataEngine=${DATA_ENGINE} | ||
And Attach volume 0 to node 0 | ||
And Wait for volume 0 healthy | ||
And Write data to volume 0 | ||
|
||
And Attach volume 0 to node 1 | ||
And Wait for volume 0 migration to be ready | ||
|
||
# power off original node | ||
When Power off node 0 | ||
# migration rollback by detaching from the migration node | ||
And Detach volume 0 from node 1 | ||
|
||
# migration is stuck until the Kubernetes pod eviction controller decides to | ||
# terminate the instance-manager pod that was running on the original node. | ||
# then Longhorn detaches the volume and attempts to cleanly reattach it to the original node, | ||
# but it is stuck in attaching until the node comes back. | ||
Then Check volume 0 kept in attaching | ||
|
||
# power on original node | ||
When Power on off nodes | ||
khushboo-rancher marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Then Wait for volume 0 to stay on node 0 | ||
And Wait for volume 0 healthy | ||
And Check volume 0 data is intact | ||
Comment on lines
+81
to
+106
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ensure proper test isolation and cleanup for node power operations The test manipulates node power state which could affect other tests. Consider:
Add these steps to the test teardown: [Teardown]
Power on off nodes
Wait for all nodes ready timeout=300
Cleanup test resources |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to detach explicitly from node 0? Should we just wait for the instance-manager to terminate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case is
Migration Confirmation After Original Node Down
, so we should detach the original node toConfirm the Migration
.If no action was taken after the original node went down, it seems to be another test case, called
Original Node Down After Migration Ready
.And I just tested this scenario: after the migration was ready, powered off the original node, and then did nothing. Eventually the volume was clearly detached:
Even after powering on the original node again, the volume remains in
detached
state permanently.supportbundle_ef8a6972-0c68-48f9-bd45-1e575e519d95_2024-12-19T00-29-41Z.zip
If we need this test case, it needs @derekbit and @PhanLe1010 to confirm whether this is the expected behavior first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is expected behavior. The volume will remain in detached until CSI flow or user detach the volume from the original node. This is the new design that when volume crash, migration will be stop, volume remain in detach state waiting for the user/csi to make decision about what is the only node Longhorn attach to. The reason behind this is that volume already crash so we don't need live migration anymore. This reduces the risk of unnecessary migration and chaotic.
Ref longhorn/longhorn#8735 (comment)
Manual test case is updated at #1948