-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backup): fix test cases for multiple backup stores #2202
Conversation
Note Reviews pausedUse the following commands to manage reviews:
Warning Rate limit exceeded@mantissahz has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 10 minutes and 10 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (10)
WalkthroughThe pull request introduces modifications to several test integration files within the Longhorn project, focusing on backup and volume management operations. The changes primarily involve refining function signatures, improving backup volume retrieval mechanisms, and enhancing test robustness. A key addition is the Changes
Suggested reviewers
Possibly related PRs
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (17)
manager/integration/tests/test_basic.py (14)
Line range hint
259-259
: Ensure consistent usage of helper function.
Using find_backup_volume keeps the implementation uniform, but confirm that all references to backup volumes in this function now use the same helper for clarity.
Line range hint
315-315
: Double-check variable scoping.
system_backup_name might also be needed outside this block. Consider placing it closer to where it’s first used if necessary.
628-628
: Validate error handling during backup deletion.
When calling delete_backup, consider checking if the backup still exists or if an exception is raised so that errors won't be silently ignored.
697-697
: Potential repeated code for snapshot creation.
If possible, encapsulate repeated snapshot creation logic to reduce duplication.
719-719
: Backup volume object usage.
Ensure the correct backup volume object is being referenced if multiple backups are present in the system.
731-731
: Prevent magic numbers.
RETRY_BACKUP_COUNTS is used directly. Document or clarify its usage to make the code self-explanatory.
877-877
: Check for snapshot infiltration.
When combining backups and snapshots, confirm that older snapshots aren’t inadvertently re-used.
978-978
: Update docstring for backup_status_for_unavailable_replicas_test.
Current docstring might need to mention concurrency checks around node scheduling for clarity.
1073-1073
: Add clarifying comment on newly introduced logic.
The new logic for discounting partial data might be unclear to new contributors. A short comment helps future maintainers.
1094-1094
: Ensure consistent naming convention.
The variable dr_vol_name2 diverges from the naming style used in other places. Consider using a uniform naming pattern.
1164-1164
: Enhance error messaging.
If found is not set to True after this loop, add a more descriptive error or debug statement for quicker diagnosis.
1311-1311
: Ensure consistent naming for function arguments.
The function’s argument might differ from the internal naming used in the data object. Consider aligning them for clarity.
1349-1349
: Enhance documentation for post-deletion verification.
If post-deletion checks are essential, add inline doc or logging to ensure a future dev grasps why they’re needed.
1854-1854
: Consider more robust error handling.
When dealing with corrupted backup files, ensure an exception triggers a user-friendly error or retry logic.manager/integration/tests/test_system_backup_restore.py (1)
315-315
: Clarify naming.
Consider naming the variable more specifically, for example backup_name_for_validation, to reflect its usage.manager/integration/tests/test_statefulset.py (1)
60-60
: Rename attribute for clarity.
Instead of volumeName, consider a more descriptive attribute if this field is used for verifying the PV name, e.g. “associatedVolumeName”.manager/integration/tests/test_metric.py (1)
768-769
: Consider adding a comment explaining the sleep durationThe sleep duration has been increased from 60s to 90s. Consider adding a comment explaining why this increase was necessary to help future maintainers understand the timing requirements.
- time.sleep(90) + # Wait 90 seconds for the recurring backup job to complete + # Previous value of 60 seconds was insufficient in some cases + time.sleep(90)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
manager/integration/tests/backupstore.py
(1 hunks)manager/integration/tests/common.py
(4 hunks)manager/integration/tests/test_basic.py
(28 hunks)manager/integration/tests/test_csi.py
(1 hunks)manager/integration/tests/test_kubernetes.py
(2 hunks)manager/integration/tests/test_metric.py
(4 hunks)manager/integration/tests/test_recurring_job.py
(10 hunks)manager/integration/tests/test_statefulset.py
(1 hunks)manager/integration/tests/test_system_backup_restore.py
(4 hunks)
🔇 Additional comments (35)
manager/integration/tests/test_basic.py (17)
Line range hint 12-12
: Use of new helper function looks good.
Importing find_backup_volume centralizes the logic of retrieving the backup volume, improving consistency.
Line range hint 325-325
: Frequent repeated calls to find_backup_volume.
If performance is a concern, try storing the result in a local variable unless there's a specific need to refresh it each time.
637-637
: Check concurrency in backup volume usage.
If multiple backups are triggered simultaneously, confirm the new helper function handles concurrency gracefully without partial updates.
703-703
: Ensure robust error handling for backup creation.
After create_backup, consider verifying backup status is InProgress, then transitions to Completed, to catch any intermediate failures.
709-709
: Double-check indexing during data writes.
Make sure different data sets aren’t overwriting each other if offset or length remains constant. This helps avoid potential data collisions.
724-724
: Consider edge cases for indefinite wait.
In loops waiting for backup state, add a timeout or maximum retry mechanism to avoid infinite loops if something goes wrong.
741-741
: Consider concurrency in backup volume removal.
If other processes are reading the backup volume, ensure they handle the volume object’s removal gracefully.
861-861
: Validate logic for offset overlap.
When writing data at overlapping offsets, confirm that the final data matches the expected content after multiple writes.
1088-1088
: Potential performance risk.
Frequent read/write calls in tight loops might degrade performance. Consider batching or exponential backoff where possible.
1159-1159
: Avoid overshadowing variables.
In nested loops, be sure that backing up the same variable name does not overshadow variables declared in outer scopes.
1162-1162
: Consider verifying backup states thoroughly.
The logic checks for the presence of backup(1), but might not confirm other attributes, such as CreatedTime or URL correctness.
1281-1281
: Double-check concurrency issues in delete_backup.
If multiple tests run in parallel, ensure the backup reference is unique and not shared with another test.
1282-1282
: Confirm leftover references after removal.
After the backups are deleted, confirm that references in data structures are also removed to avoid memory usage growth.
1443-1443
: Validate cleanup logic.
After a restore operation gets canceled or fails, confirm that leftover volumes or stale references are cleared.
1739-1739
: Test coverage for volume with no backups.
Line 1739 references volume lastBackup; ensure there’s coverage for volumes that never had backups.
1890-1890
: Potential leftover references.
Deleting a backup volume may leave behind references if the code doesn’t handle ongoing backup tasks. Confirm that’s not an issue.
1891-1891
: Edge case: leftover block files.
After removing the backup volume, confirm no block files remain that might mislead future backups or cause disk usage issues.
manager/integration/tests/test_system_backup_restore.py (3)
12-12
: Use of find_backup_volume import.
This import aligns with the approach in other files, ensuring the backup volume retrieval logic is unified.
259-259
: Double-check concurrency with find_backup_volume.
If the system backup interacts with multiple volumes, ensure calls to find_backup_volume do not overshadow each other.
325-325
: Validate consistent usage of backup volume across steps.
Ensure that the same backup volume object is passed to subsequent calls, preventing mismatch if multiple backups exist.
manager/integration/tests/backupstore.py (1)
151-152
: Early-return optimization.
Skipping an unnecessary update is efficient. Ensure that no side effects are missed when the backup target remains unchanged.
manager/integration/tests/test_metric.py (3)
43-43
: LGTM: Import aligns with PR objective
The addition of find_backup_volume
import aligns with the PR's objective to fix test cases for multiple backup stores.
Also applies to: 43-43
721-721
: LGTM: Improved backup volume handling
The change to unpack backup volume from create_backup
improves clarity and consistency in backup handling.
750-750
: LGTM: Consistent backup deletion
Using delete_backup_volume
directly with the backup volume object improves code consistency.
manager/integration/tests/test_kubernetes.py (3)
588-588
: LGTM: Improved backup volume retrieval
Using volbv
to store the backup volume improves code clarity and maintainability.
Line range hint 594-599
: LGTM: Added retry logic for backup volume labels
The addition of retry logic for backup volume labels handles potential race conditions when the backup volume or its labels are not immediately available.
632-632
: LGTM: Consistent backup deletion
Updated delete_backup
call to pass the backup volume object directly, maintaining consistency with other changes in the PR.
manager/integration/tests/test_csi.py (1)
249-249
: LGTM: Consistent backup deletion
Updated delete_backup
call to pass the backup volume object directly, maintaining consistency with the backup volume handling changes across the test suite.
manager/integration/tests/test_recurring_job.py (5)
Line range hint 515-527
: Improve backup volume handling with retry mechanism
The code now uses the enhanced find_backup_volume()
function with retry capability and proper error handling for backup volume operations.
Line range hint 588-603
: Enhance backup volume status verification
The code improves the verification of Kubernetes status labels on backup volumes with better error handling and assertions.
785-788
: Add retry mechanism for backup volume count verification
The code adds proper retry logic when waiting for backup volume count, improving test reliability.
2001-2013
: Improve backup volume verification with retry mechanism
The code enhances backup volume verification by using the retry-capable find_backup_volume()
function and proper error handling.
1074-1077
: Enhance backup volume handling in multiple volume tests
The code improves backup volume operations in multiple volume test scenarios with better error handling and retry mechanisms.
Also applies to: 1591-1591, 1602-1603
manager/integration/tests/common.py (2)
3221-3227
: Add retry-capable backup volume lookup function
The new find_backup_volume()
function provides a robust way to find backup volumes with retry capability, improving test reliability.
3956-3966
: Enhance backup volume wait function
The modified wait_for_backup_volume()
function improves error handling and adds proper assertions for backup volume state verification.
assert backupstore_count_backup_block_files(client, | ||
core_api, | ||
volume_name) == 2 | ||
|
||
delete_backup(client, volume_name, backup2.name) | ||
delete_backup(client, backup_volume, backup2.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check for partial deletions.
When deleting backups, ensure partial or incomplete states are cleaned up. This can prevent leftover references.
|
||
delete_backup_volume(client, volume_name) | ||
delete_backup_volume(client, bv.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential race condition.
Deleting a backup volume while a backup is in progress might cause a race condition if not handled. Use robust checks.
bv3, b3 = find_backup(client, volume3_name, snap3.name) | ||
delete_backup(client, bv3, b3.name) | ||
bv4, b4 = find_backup(client, volume3_name, snap4.name) | ||
delete_backup(client, bv3, b4.name) | ||
bv5, b5 = find_backup(client, volume3_name, snap5.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Bulk deletion of multiple backups.
Deleting multiple backups in a quick succession can be error-prone. Possibly handle them in a single loop with consistent checks.
3202cb2
to
1263a63
Compare
@@ -148,6 +148,8 @@ def set_backupstore_azurite(client): | |||
|
|||
def set_backupstore_url(client, url): | |||
backup_target_setting = client.by_id_setting(SETTING_BACKUP_TARGET) | |||
if backup_target_setting.value == url: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't the same backup target be set twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove this at next update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The REST API itself rejects the request if the backup target is the same, but why?
(ApiError(...), "500 : failed to set the setting backup-target with invalid value s3://backupbucket@us-east-1/backupstore: url s3://backupbucket@us-east-1/backupstore is the same to backup target default\n{'code': 500,
'detail': '', 'message': 'failed to set the setting backup-target with invalid value s3://backupbucket@us-east-1/backupstore: url s3://backupbucket@us-east-1/backupstore is the same to backup target default', 'status': 500}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a limit in the multiple backup stores PR (it's meanless to update the same URL for a backup target), but I would like to allow it in the PR longhorn/longhorn-manager#3360
When I tried to list backups from a backup volume by calling the REST API like:
The backups from other volumes were also listed:
Is this expected behavior? |
It will be fixed after longhorn/longhorn-manager#3360. |
Using PR code to run backup test cases( |
tests.test_basic.test_backup_metadata_deletion[s3] is related to issue that @yangchiu mentioned above, |
1263a63
to
eaa1b4d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (12)
manager/integration/tests/backupstore.py (4)
36-36
: Clarify semantic meaning of default backup target.
The constant DEFAULT_BACKUPTARGET = "default" may mix two concepts: a default backup target reference vs. the string "default" implying system defaults. If there's more than one default, consider clarifying the name.
37-38
: Improve maintainability of error message.
object_has_been_modified is a custom message matching a K8s concurrency error. Consider referencing or linking to the underlying K8s error type to make it easier for future contributors to find the root cause.
164-175
: Confirm atomicity of backup target updates.
The code handles ResourceVersion conflicts by catching object_has_been_modified. If the setting is updated frequently, ensure that no other concurrently updated settings are lost in each retry.
182-193
: Consistent concurrency handling.
Similar concurrency retry block as above. Extract into a helper function to reduce duplication and ensure consistent error handling between backup_target_credential_secret and backupstore_url updates.manager/integration/tests/test_metric.py (2)
721-721
: Verify data consistency before backup creation.
When calling create_backup, confirm that the data written to the volume is fully synced or flushed for consistent backups. Just a best practice reminder.
750-750
: Consider re-check for volume presence before deletion.
delete_backup_volume might raise an error if the backup volume was already removed. A quick check in tests can improve clarity of test logs.manager/integration/tests/test_recurring_job.py (4)
1075-1075
: Check the logic for waiting on backup completion.
wait_for_backup_count looks good. Evaluate the chance of partial backups or concurrency issues when test-job-1 runs multiple backups in close intervals.
1078-1078
: Avoid silent failures with retry=60.
The explicit retry of 60 is quite large. Consider adding an error or debug message each time the code is waiting, to ease troubleshooting.
1592-1592
: Check for concurrency & slow environment.
Same remark: if the environment lags, 30 short tries might not be enough to discover the backup volume. Keep this in mind for CI reliability.
2002-2005
: Restructure repeated call pattern.
The calls to wait_for_backup_count with find_backup_volume are repeated. Possibly unify them into a single helper function with optional arguments for the volume & expected counts.manager/integration/tests/common.py (1)
3956-3966
: Prevent infinite loop & clarify assertion.
wait_for_backup_volume loops for up to RETRY_BACKUP_COUNTS. If the volume is never found, it only asserts at the end. Consider extra logs for partial failures or repeated attempts for debugging.manager/integration/tests/test_basic.py (1)
3241-3247
: Consider adding timeout handling for backup restorationWhile the backup deletion during restoration is handled correctly, consider adding timeout handling for the restoration wait to prevent potential hangs.
bv, b = find_backup(client, std_volume_name, snap1.name) + try: client.create_volume(name=restore_volume_name, fromBackup=b.url, numberOfReplicas=3, dataEngine=DATA_ENGINE) wait_for_volume_restoration_start(client, restore_volume_name, b.name) + except Exception as e: + logging.error("Volume restoration failed: %s", str(e)) + raise
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
manager/integration/tests/backupstore.py
(4 hunks)manager/integration/tests/common.py
(4 hunks)manager/integration/tests/test_basic.py
(28 hunks)manager/integration/tests/test_csi.py
(1 hunks)manager/integration/tests/test_kubernetes.py
(2 hunks)manager/integration/tests/test_metric.py
(4 hunks)manager/integration/tests/test_recurring_job.py
(11 hunks)manager/integration/tests/test_statefulset.py
(1 hunks)manager/integration/tests/test_system_backup_restore.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- manager/integration/tests/test_system_backup_restore.py
- manager/integration/tests/test_csi.py
- manager/integration/tests/test_statefulset.py
- manager/integration/tests/test_kubernetes.py
🔇 Additional comments (22)
manager/integration/tests/backupstore.py (2)
17-18
: Ensure constants are well-defined.
Importing NODE_UPDATE_RETRY_COUNT and NODE_UPDATE_RETRY_INTERVAL from "common" helps unify the global retry logic. Confirm that these constants match their use case to avoid unintended infinite loops or too-long sleeps.
99-106
: Recheck repeated logic in loop.
The for-loop repeatedly checks backupTargetURL, credentialSecret, and pollInterval. If this condition is tied to concurrency logic, ensure it’s not missing error handling or a fallback mechanism in case the settings do not reset to empty.
manager/integration/tests/test_metric.py (2)
43-43
: Centralize function usage.
Importing find_backup_volume for test_metric indicates cross-file usage. Ensure that usage is consistent with any existing find_backup method calls in other test modules to avoid duplication.
768-769
: Provide rational for the 90-second wait.
time.sleep(90) is somewhat large. If it accommodates consistent test timing, consider explaining in a code comment or adjusting dynamically by poll intervals.
manager/integration/tests/test_recurring_job.py (4)
56-56
: Consolidate usage of backup utilities.
find_backup_volume is imported in test_recurring_job. This function is repeated in many tests. Confirm it’s the canonical approach to retrieving a backup volume.
79-79
: Use RETRY_COUNTS_SHORT carefully.
Ensure RETRY_COUNTS_SHORT is not too small to cause flaky test failures, especially under heavier load or slower storage environments.
1580-1580
: Review usage of "retry=RETRY_COUNTS_SHORT".
Similar to line 79 mention, double-check if short retries risk missing the backup volume in slower test setups.
1603-1604
: Distinct line usage for waiting backup counts.
Consider combining into one wait_for_backup_count call if logic is the same. Provided the test scenario is straightforward.
manager/integration/tests/common.py (3)
519-521
: Adapt to standard signature for backup deletions.
Switching delete_backup to accept bv and backup_name clarifies usage. Consider adding code-level docstring that clarifies the difference between volumeName and backup volume object.
524-525
: Confirm backward compatibility.
Renaming the parameter from volume_name to backup_volume_name is good but ensure no old references remain (e.g., "delete_backup_volume(volume_name, ...)" calls).
527-527
: Call wait_for_backup_volume_delete.
This ensures a synchronous test flow. Validate that the code handles any unexpected concurrency errors gracefully if the backup volume was removed in prior steps.
manager/integration/tests/test_basic.py (11)
Line range hint 628-637
: LGTM: Backup deletion and status verification looks correct
The backup deletion and volume status verification logic is properly implemented with appropriate wait conditions.
697-709
: LGTM: Backup creation sequence is well structured
The backup creation sequence with proper data writing and verification is implemented correctly.
719-724
: LGTM: Backup volume validation and cleanup
Good validation of backup volume data and proper cleanup with in-progress backup handling.
731-741
: LGTM: Backup deletion and block count verification
The backup deletion and block count verification is implemented correctly with proper wait conditions.
Line range hint 1281-1311
: LGTM: Backup metadata deletion sequence
The backup metadata deletion sequence is well structured with proper verification of backup states and cleanup.
🧰 Tools
🪛 Ruff (0.8.2)
1284-1284: Loop control variable i
not used within loop body
Rename unused i
to _i
(B007)
Line range hint 1349-1359
: LGTM: Additional backup cleanup logic
Good implementation of backup cleanup with proper block count verification.
🧰 Tools
🪛 Ruff (0.8.2)
1350-1350: Loop control variable i
not used within loop body
Rename unused i
to _i
(B007)
Line range hint 1443-1449
: LGTM: Simple backup deletion in backupstore test
Clean implementation of backup deletion with status verification.
Line range hint 3314-3324
: LGTM: Backup deletion during backup creation
Good implementation of backup deletion during creation with proper verification of backup states.
Also applies to: 3331-3335
3458-3466
: LGTM: Backup deletion during restore with verification
The backup deletion during restore and subsequent verification is implemented correctly.
Also applies to: 3478-3482
4662-4662
: LGTM: Simple backup deletion
Clean implementation of single backup deletion.
5422-5424
: LGTM: Backup deletion during volume restore
Good implementation of backup deletion during volume restore with proper status verification.
for _ in range(NODE_UPDATE_RETRY_COUNT): | ||
try: | ||
backup_target_poll_interal_setting = client.update( | ||
backup_store_poll_interval_setting, value=poll_interval) | ||
except Exception as e: | ||
if object_has_been_modified in str(e.error.message): | ||
time.sleep(NODE_UPDATE_RETRY_INTERVAL) | ||
continue | ||
print(e) | ||
raise | ||
else: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Check poll interval usage & clarity.
set_backupstore_poll_interval is also wrapped in concurrency logic. If the code attempts repeated updates triggered by “object_has_been_modified” errors, consider adding logs to ensure debugging clarity in production.
wait_for_backup_count( | ||
find_backup_volume(client, volume1_name, retry=RETRY_COUNTS_SHORT), 2) | ||
wait_for_backup_count( | ||
find_backup_volume(client, volume2_name, retry=RETRY_COUNTS_SHORT), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Check consistency for final backup checks.
The second round is for verifying 2 backups. Ensure final test validations match test objectives so that incomplete backups are not counted incorrectly.
def find_backup_volume(client, volume_name, retry=1): | ||
for _ in range(retry): | ||
bvs = client.list_backupVolume() | ||
for bv in bvs: | ||
if bv.volumeName == volume_name and bv.created != "": | ||
return bv | ||
time.sleep(RETRY_BACKUP_INTERVAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider returning partial result or raising error.
When find_backup_volume cannot find the volume, it returns None after retries. Consider logging a warning or raising a custom exception to help test debugging.
+raise Exception(f"No backup volume found for volume {volume_name} after {retry} retries")
Committable suggestion skipped: line range outside the PR's diff.
63b4f29
to
0284c79
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (2)
manager/integration/tests/test_basic.py (2)
Line range hint
1847-1891
: Improve backup deletion validationThe backup deletion sequence has been improved but needs better validation.
Add validation checks and error handling for backup deletion operations:
bv1, b1 = find_backup(client, volume1_name, snap1.name) + assert b1 is not None, "Backup b1 not found" delete_backup(client, bv1, b1.name) bv2, b2 = find_backup(client, volume2_name, snap2.name) + assert b2 is not None, "Backup b2 not found" delete_backup(client, bv2, b2.name) delete_backup(client, bv3, b3.name) + # Verify backup was deleted + assert wait_for_backup_delete(client, volume_name, b3.name) delete_backup_volume(client, bv3.name) + # Verify backup volume was deleted + assert wait_for_backup_volume_delete(client, bv3.name)
Line range hint
3314-3324
: Enhance backup deletion error handlingThe backup deletion error handling could be improved.
Add error handling and validation for backup deletion:
+ # Validate backup exists before deletion + assert b1 is not None, "Backup b1 not found" backup_volume = client.by_id_backupVolume(bv.name) + try: backup_volume.backupDelete(name=b1.name) + except Exception as e: + logging.error(f"Failed to delete backup {b1.name}: {str(e)}") + raiseAlso applies to: 3331-3335
🧹 Nitpick comments (6)
manager/integration/tests/test_metric.py (1)
768-769
: Consider adding a comment explaining the sleep durationThe sleep duration was increased from 60s to 90s. Consider adding a comment explaining why this longer duration is necessary for the recurring backup job to complete.
+ # Wait 90 seconds to ensure the recurring backup job completes time.sleep(90)
manager/integration/tests/test_basic.py (4)
Line range hint
628-637
: Improve backup deletion error handlingThe backup deletion code has been simplified but could benefit from additional error handling for edge cases.
Consider adding error handling and retries for backup deletion failures:
- delete_backup(client, bv, b.name) + try: + delete_backup(client, bv, b.name) + except Exception as e: + logging.error(f"Failed to delete backup {b.name}: {str(e)}") + raise
697-709
: Improve backup creation and deletion flowThe backup creation and deletion sequence has been improved but could be more robust.
Consider adding validation checks before backup operations:
+ # Validate backup exists before deletion + assert backup0 is not None, "Backup0 not found" _, backup0, _, _ = create_backup(client, volume_name, data0) + # Validate backup exists before deletion + assert backup1 is not None, "Backup1 not found" _, backup1, _, _ = create_backup(client, volume_name, data1) + # Validate backup exists before deletion + assert backup2 is not None, "Backup2 not found" backup_volume, backup2, _, _ = create_backup(client, volume_name, data2)
3241-3247
: Improve backup volume retrieval and deletionThe backup volume retrieval and deletion flow could be more robust.
Add validation for backup volume existence:
+ # Validate backup volume exists before operations + assert bv is not None, "Backup volume not found" bv, b = find_backup(client, std_volume_name, snap1.name)
5422-5422
: Add validation for backup deletionThe backup deletion could use additional validation.
Add validation check before backup deletion:
+ assert b is not None, "Backup not found" delete_backup(client, bv, b.name)
manager/integration/tests/common.py (1)
3221-3227
: Consider adding error logging for debugging.While the retry mechanism is good, consider adding debug logging when retries occur to help troubleshoot backup volume lookup issues.
def find_backup_volume(client, volume_name, retry=1): for _ in range(retry): bvs = client.list_backupVolume() for bv in bvs: if bv.volumeName == volume_name and bv.created != "": return bv + if retry > 1: + print(f"Retrying backup volume lookup for {volume_name}") time.sleep(RETRY_BACKUP_INTERVAL) return None
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
manager/integration/tests/backupstore.py
(4 hunks)manager/integration/tests/common.py
(4 hunks)manager/integration/tests/test_basic.py
(28 hunks)manager/integration/tests/test_csi.py
(1 hunks)manager/integration/tests/test_kubernetes.py
(2 hunks)manager/integration/tests/test_metric.py
(4 hunks)manager/integration/tests/test_recurring_job.py
(11 hunks)manager/integration/tests/test_statefulset.py
(1 hunks)manager/integration/tests/test_system_backup_restore.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- manager/integration/tests/test_system_backup_restore.py
- manager/integration/tests/test_statefulset.py
- manager/integration/tests/test_csi.py
- manager/integration/tests/test_kubernetes.py
- manager/integration/tests/test_recurring_job.py
🔇 Additional comments (8)
manager/integration/tests/test_metric.py (1)
721-721
: LGTM: Improved backup deletion with direct backup volume reference
The code now uses the backup volume object directly instead of just the name, making the deletion more robust.
manager/integration/tests/backupstore.py (5)
36-38
: LGTM: Clear and descriptive constant definitions.
The new constants improve code readability and maintainability by centralizing common values.
99-105
: LGTM: Robust retry mechanism for resetting backup store settings.
The retry mechanism ensures reliable backup store settings reset by:
- Checking backup target URL, credential secret and poll interval
- Using configurable retry count and interval
163-174
: LGTM: Proper error handling for backup store URL updates.
The code properly handles concurrent modification errors by:
- Retrying on "object has been modified" errors
- Using configurable retry count and interval
- Preserving other error types for debugging
181-192
: LGTM: Consistent error handling for credential updates.
The error handling matches the pattern used for URL updates, maintaining consistency across the codebase.
199-210
: LGTM: Consistent error handling for poll interval updates.
The error handling follows the same robust pattern used for other backup store settings.
manager/integration/tests/common.py (2)
519-521
: LGTM: Simplified backup deletion API usage.
The code now accepts the backup volume object directly, eliminating redundant volume name lookup.
524-527
: LGTM: Simplified backup volume deletion API usage.
The code now accepts the backup volume name directly, improving clarity and reducing indirection.
if bv['name'] == backup_volume.name: | ||
assert bv['dataStored'] == \ | ||
str(backup_blocks_count * BACKUP_BLOCK_SIZE) | ||
|
||
backupstore_create_dummy_in_progress_backup(client, core_api, volume_name) | ||
delete_backup(client, volume_name, backup1.name) | ||
delete_backup(client, backup_volume, backup1.name) | ||
assert backupstore_count_backup_block_files(client, | ||
core_api, | ||
volume_name) == 4 | ||
|
||
backupstore_delete_dummy_in_progress_backup(client, core_api, volume_name) | ||
|
||
delete_backup(client, volume_name, backup0.name) | ||
delete_backup(client, backup_volume, backup0.name) | ||
assert backupstore_count_backup_block_files(client, | ||
core_api, | ||
volume_name) == 2 | ||
|
||
delete_backup(client, volume_name, backup2.name) | ||
delete_backup(client, backup_volume, backup2.name) | ||
assert backupstore_count_backup_block_files(client, | ||
core_api, | ||
volume_name) == 0 | ||
|
||
delete_backup_volume(client, volume_name) | ||
delete_backup_volume(client, backup_volume.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance backup volume deletion flow
The backup volume deletion logic has been improved but could use additional validation.
Add validation for backup volume existence and status before deletion:
+ # Validate backup volume exists
+ assert backup_volume is not None, "Backup volume not found"
+
if bv['name'] == backup_volume.name:
assert bv['dataStored'] == \
str(backup_blocks_count * BACKUP_BLOCK_SIZE)
delete_backup(client, backup_volume, backup1.name)
+ # Verify backup was deleted
+ assert wait_for_backup_delete(client, volume_name, backup1.name)
delete_backup(client, backup_volume, backup0.name)
+ # Verify backup was deleted
+ assert wait_for_backup_delete(client, volume_name, backup0.name)
delete_backup(client, backup_volume, backup2.name)
+ # Verify backup was deleted
+ assert wait_for_backup_delete(client, volume_name, backup2.name)
delete_backup_volume(client, backup_volume.name)
Committable suggestion skipped: line range outside the PR's diff.
Does this PR include all the test cases that need to be refined? |
3b083d0
to
8599ce7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
manager/integration/tests/test_csi_snapshotter.py (1)
1368-1368
: Validate usage of new “storage_class” parameter
It seems you’ve introduced a new "storage_class" parameter to this test function’s signature. The parameter’s usage in subsequent restore logic appears valid. However, ensure that you consistently handle or test different storage classes (e.g., beyond the default) to confirm that no additional modifications are needed elsewhere.manager/integration/tests/backupstore.py (2)
99-105
: LGTM! Robust retry mechanism for backup store reset.The retry logic properly verifies that the backup target settings are completely reset before proceeding. Good use of constants for retry count and interval.
Consider adding a debug log when retrying to help with troubleshooting:
+ print(f"Waiting for backup target reset, attempt {_+1}") time.sleep(NODE_UPDATE_RETRY_INTERVAL)
163-174
: LGTM! Good error handling and retry logic.The retry mechanism properly handles object modification conflicts and validates the URL was set correctly.
Consider extracting the error message check into a helper function since it's used in multiple places:
def is_object_modified_error(error): return object_has_been_modified in str(error.error.message)This would improve code reuse and make the error handling more maintainable.
manager/integration/tests/test_basic.py (1)
741-741
: LGTM! Consistent backup volume deletion handling.The changes maintain a consistent pattern for backup volume deletion and proper cleanup.
Consider adding a helper function to encapsulate the common pattern of deleting a backup and waiting for completion:
def delete_backup_and_wait(client, backup_volume, backup_name, volume_name): """Delete a backup and wait for it to be removed.""" delete_backup(client, backup_volume, backup_name) wait_for_backup_delete(client, volume_name, backup_name) volume = wait_for_volume_status(client, volume_name, "lastBackup", "") assert volume.lastBackupAt == ""This would reduce code duplication and make the cleanup flow more maintainable.
Also applies to: 1094-1094, 1311-1349, 1890-1891
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
manager/integration/tests/backupstore.py
(4 hunks)manager/integration/tests/common.py
(4 hunks)manager/integration/tests/test_basic.py
(28 hunks)manager/integration/tests/test_csi.py
(1 hunks)manager/integration/tests/test_csi_snapshotter.py
(2 hunks)manager/integration/tests/test_kubernetes.py
(2 hunks)manager/integration/tests/test_metric.py
(4 hunks)manager/integration/tests/test_recurring_job.py
(11 hunks)manager/integration/tests/test_statefulset.py
(1 hunks)manager/integration/tests/test_system_backup_restore.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- manager/integration/tests/test_statefulset.py
- manager/integration/tests/test_system_backup_restore.py
- manager/integration/tests/test_kubernetes.py
- manager/integration/tests/test_csi.py
- manager/integration/tests/test_recurring_job.py
- manager/integration/tests/test_metric.py
- manager/integration/tests/common.py
🔇 Additional comments (3)
manager/integration/tests/test_csi_snapshotter.py (1)
1492-1492
: Potential race condition with “wait_for_restore=False”
Disabling the wait for volume restoration may lead to a scenario where the test proceeds before the volume is fully restored, potentially causing unreliable test outcomes. Consider confirming that the volume is ready before the next steps to avoid intermittent failures.
manager/integration/tests/backupstore.py (1)
36-38
: LGTM! Well-defined constants.
The new constants are well-named and follow Python naming conventions. The error message string is properly formatted for readability.
manager/integration/tests/test_basic.py (1)
628-628
: LGTM! Improved type safety in delete_backup() calls.
The changes consistently pass backup volume objects instead of names to delete_backup(), which:
- Improves type safety
- Reduces potential errors from incorrect string names
- Makes the code more maintainable
Also applies to: 637-637, 1281-1282, 1311-1311, 1324-1324, 1349-1349, 1443-1443, 4662-4662
for _ in range(NODE_UPDATE_RETRY_COUNT): | ||
try: | ||
backup_target_credential_setting = client.update( | ||
backup_target_credential_setting, value=credential_secret) | ||
except Exception as e: | ||
if object_has_been_modified in str(e.error.message): | ||
time.sleep(NODE_UPDATE_RETRY_INTERVAL) | ||
continue | ||
print(e) | ||
raise | ||
else: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider extracting common retry logic.
The retry mechanism is duplicated across set_backupstore_url(), set_backupstore_credential_secret() and set_backupstore_poll_interval().
Consider extracting the retry logic into a decorator or helper function:
def retry_on_conflict(func):
def wrapper(*args, **kwargs):
for _ in range(NODE_UPDATE_RETRY_COUNT):
try:
return func(*args, **kwargs)
except Exception as e:
if object_has_been_modified in str(e.error.message):
time.sleep(NODE_UPDATE_RETRY_INTERVAL)
continue
print(e)
raise
return wrapper
@retry_on_conflict
def set_backupstore_url(client, url):
backup_target_setting = client.by_id_setting(SETTING_BACKUP_TARGET)
backup_target_setting = client.update(backup_target_setting, value=url)
assert backup_target_setting.value == url
This would:
- Eliminate code duplication
- Make the retry logic more maintainable
- Make it easier to add retry to other functions
Also applies to: 199-210
It should be, but I might lose some. I have some failed test cases that were not re-run. "test_basic.py::test_dr_volume_activated_with_failed_replica",
"test_basic.py::test_allow_volume_creation_with_degraded_availability_dr",
"test_engine_upgrade.py::test_auto_upgrade_engine_to_default_version_dr_volume",
"test_ha.py::test_engine_crash_for_dr_volume",
"test_ha.py::test_engine_image_miss_scheduled_perform_volume_operations",
"test_ha.py::test_engine_image_not_fully_deployed_perform_volume_operations",
"test_ha.py::test_engine_image_not_fully_deployed_perform_dr_restoring_expanding_volume", Update: ============================= test session starts ==============================
platform linux -- Python 3.11.10, pytest-6.2.4, py-1.11.0, pluggy-0.13.1 -- /usr/bin/python3.11
cachedir: .pytest_cache
rootdir: /integration, configfile: pytest.ini
plugins: order-1.0.1, repeat-0.9.1
collecting ... collected 7 items
test_basic.py::test_dr_volume_activated_with_failed_replica[s3] PASSED [ 14%]
test_basic.py::test_allow_volume_creation_with_degraded_availability_dr[s3] PASSED [ 28%]
test_engine_upgrade.py::test_auto_upgrade_engine_to_default_version_dr_volume[s3] PASSED [ 42%]
test_ha.py::test_engine_crash_for_dr_volume[s3] PASSED [ 57%]
test_ha.py::test_engine_image_miss_scheduled_perform_volume_operations[s3] PASSED [ 71%]
test_ha.py::test_engine_image_not_fully_deployed_perform_volume_operations[s3] PASSED [ 85%]
test_ha.py::test_engine_image_not_fully_deployed_perform_dr_restoring_expanding_volume[s3] PASSED [100%]
================== 7 passed, 8 warnings in 1586.42s (0:26:26) ================== Looks fine in my local cluster. |
8599ce7
to
d1a5ab9
Compare
I'm not sure if these failures have been fixed: https://ci.longhorn.io/job/private/job/longhorn-tests-regression/7817/testReport/. Could you please help confirm? |
tests.test_recurring_job.test_recurring_job_groups
tests.test_recurring_job.test_recurring_job_multiple_volume Yes, I think these are only two left, and I still look into the cases. |
ref: longhorn/longhorn 5411 Signed-off-by: James Lu <[email protected]>
d1a5ab9
to
ccf3cdd
Compare
@mantissahz |
@coderabbitai pause |
✅ Actions performedReviews paused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Which issue(s) this PR fixes:
Issue # longhorn/longhorn#5411
What this PR does / why we need it:
Special notes for your reviewer:
Additional documentation or context
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Refactor
Tests