Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot): expand PVC more than storage maximum size #2178

Merged
merged 7 commits into from
Dec 10, 2024

Conversation

c3y1huang
Copy link
Collaborator

@c3y1huang c3y1huang commented Dec 6, 2024

Which issue(s) this PR fixes:

Issue longhorn/longhorn#6633

What this PR does / why we need it:

Add a robot test case to verify that a PVC cannot be expanded beyond the storage maximum size.

Special notes for your reviewer:

None

Additional documentation or context

None

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced persistent volume claim creation with flexible configuration options.
    • New keyword for verifying persistent volume claim requested size over time.
    • New methods for volume size retrieval and maximum disk storage checks.
    • Added functionality for expanding workloads and persistent volume claims with improved size management.
    • New utility functions for managing custom resources and size conversions.
    • Enhanced logging and error handling in AWS operations.
    • New test suite for validating persistent volume claim behavior, including checks for expansion limits.
  • Bug Fixes

    • Improved error handling and logging in backup operations.
    • Enhanced error handling and logging in Rest class methods.
    • Refined error handling for annotation setting in utility functions.
  • Style

    • Updated phrasing in test cases for clarity regarding volume expansion actions.

@c3y1huang c3y1huang self-assigned this Dec 6, 2024
Copy link

coderabbitai bot commented Dec 6, 2024

Walkthrough

The pull request introduces multiple enhancements to the handling of persistent volume claims (PVCs) and workload management in the codebase. Key modifications include the addition of new keywords for volume size assertions and expansion, as well as updates to existing methods to improve flexibility and error handling. Notably, the changes aim to prevent issues related to volume resizing, ensuring that operations do not exceed available storage capacity. A new test suite is also added to validate the behavior of PVCs under various conditions.

Changes

File Path Change Summary
e2e/keywords/persistentvolumeclaim.resource - Updated Create persistentvolumeclaim to accept &{config}.
- Renamed ${pvc_name} to ${claim_name} in Delete persistentvolumeclaim for volume.
- Added Assert persistentvolumeclaim ${claim_id} requested size remains ${size} for at least ${period} seconds.
e2e/keywords/statefulset.resource - Removed Expand statefulset ${statefulset_id} volume by ${size} MiB.
e2e/keywords/workload.resource - Added Expand ${workload_kind} ${workload_id} volume by ${size} and Expand ${workload_kind} ${workload_id} volume more than storage maximum size should fail.
- Added Assert volume size of ${workload_kind} ${workload_id} remains ${size} for at least ${period} seconds.
e2e/libs/backup/backup.py - Updated _strategy to use LonghornOperationStrategy.REST.
- Simplified constructor logic for backup strategy selection.
e2e/libs/backup/base.py - Added Base class with multiple abstract and concrete methods for backup management.
e2e/libs/backup/rest.py - Enhanced error handling and logging in methods like create, get, and wait_for_backup_completed.
e2e/libs/backupstore/base.py - Removed unused imports and added a blank line for readability.
e2e/libs/backupstore/nfs.py - Reorganized import statements without affecting functionality.
e2e/libs/backupstore/s3.py - Updated get_api_client to handle base64_minio_cert and modified file handling methods for better error management.
e2e/libs/engine_image/engine_image.py - Removed unused import for logging.
e2e/libs/host/aws.py - Enhanced logging in methods for rebooting nodes.
e2e/libs/host/harvester.py - Reorganized import statements without changing functionality.
e2e/libs/k8s/k8s.py - Reorganized import statements; no functional changes.
e2e/libs/keywords/common_keywords.py - Added convert_size_to_bytes method for size conversions.
e2e/libs/keywords/persistentvolumeclaim_keywords.py - Updated create_persistentvolumeclaim to include storage_size.
- Added get_claim_requested_size method.
e2e/libs/keywords/volume_keywords.py - Added get_volume_size and get_volume_node_disk_storage_maximum methods.
e2e/libs/keywords/workload_keywords.py - Renamed expand_workload_claim_size_by_mib to expand_workload_claim_size and updated its parameters.
e2e/libs/node/node.py - Updated get_node_by_name to include namespace parameter.
e2e/libs/persistentvolumeclaim/crd.py - Updated expand method to include skip_retry parameter.
e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py - Updated create and expand methods to include storage_size and skip_retry parameters, respectively.
e2e/libs/utility/utility.py - Added functions for custom resource retrieval and size conversion.
e2e/libs/volume/crd.py - Removed unused import for os.
e2e/tests/regression/test_persistentvolumeclaim.robot - Added a new test suite for validating PVC behavior, including a test case for volume expansion beyond maximum size.

Assessment against linked issues

Objective Addressed Explanation
Prevent volume resizing beyond maximum storage size (Issue #6633)
Implement checks before volume expansion to avoid stuck states (Issue #6633)

Possibly related PRs

  • test(robot): migrate test_replica_rebuild_per_volume_limit #2159: The changes in the main PR introduce a new keyword related to persistent volume claims, while this PR focuses on migrating a test case for replica rebuilding, which also involves volume management. Both PRs enhance functionality related to volume operations in Kubernetes.

Suggested reviewers

  • chriscchien

Poem

In the land of bytes and claims,
A rabbit hops and plays new games.
Expanding volumes, oh so bright,
With checks in place, we’ll do it right!
So let’s rejoice, in code we trust,
For every change, it’s a must! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@c3y1huang c3y1huang force-pushed the 6633-prevent-resize-stuck branch from 7aecb1c to f8ae949 Compare December 6, 2024 00:07
@c3y1huang
Copy link
Collaborator Author

@coderabbitai review

Copy link

coderabbitai bot commented Dec 6, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Outside diff range and nitpick comments (10)
e2e/libs/host/aws.py (2)

Line range hint 18-35: Add error handling for AWS waiter timeouts

The code uses AWS waiters to check instance states but doesn't handle potential timeout exceptions. This could lead to test failures if AWS operations take longer than expected.

Consider wrapping waiter calls with try-except blocks:

-        waiter = self.aws_client.get_waiter('instance_stopped')
-        waiter.wait(InstanceIds=instance_ids)
+        try:
+            waiter = self.aws_client.get_waiter('instance_stopped')
+            waiter.wait(
+                InstanceIds=instance_ids,
+                WaiterConfig={'Delay': 15, 'MaxAttempts': 40}
+            )
+        except Exception as e:
+            logging(f"Error waiting for instances to stop: {e}")
+            raise

Also applies to: 36-52, 53-69, 70-78, 79-87


Line range hint 18-35: Document implications of force shutdown

The Force=True parameter in stop_instances calls might lead to data corruption or incomplete operations. This should be documented and potentially made configurable.

Add a docstring explaining the force shutdown behavior:

     def reboot_all_nodes(self, shut_down_time_in_sec=NODE_REBOOT_DOWN_TIME_SECOND):
+        """Reboot all nodes in the cluster.
+        
+        Args:
+            shut_down_time_in_sec (int): Time to wait between stop and start operations
+        
+        Note:
+            Uses force shutdown which might lead to abrupt termination of workloads.
+            Ensure your test cases can handle potential data loss or corruption.
+        """

Also applies to: 36-52, 53-69

e2e/libs/backup/base.py (2)

Line range hint 8-10: Replace return NotImplemented with raise NotImplementedError() in abstract methods

In the abstract methods, using return NotImplemented is inappropriate. Instead, you should raise a NotImplementedError() to indicate that the method must be implemented by subclasses. This ensures that if the method is accidentally called, an exception is raised.

Apply this diff to correct the abstract methods:

     @abstractmethod
     def create(self, volume_name, backup_id):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def get(self, backup_id, volume_name):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def get_backup_volume(self, volume_name):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def list(self, volume_name):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def list_all(self):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def delete(self, volume_name, backup_id):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def delete_backup_volume(self, volume_name):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def restore(self, volume_name, backup_id):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def check_restored_volume_checksum(self, volume_name, backup_name):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def get_restored_checksum(self, backup_name):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def cleanup_backup_volumes(self):
-        return NotImplemented
+        raise NotImplementedError()

     @abstractmethod
     def cleanup_system_backups(self):
-        return NotImplemented
+        raise NotImplementedError()

Also applies to: 25-26, 36-37, 47-48, 54-55, 59-60, 65-66, 70-71


Line range hint 29-30: Add @abstractmethod decorator to abstract methods without it

The methods get_by_snapshot and wait_for_backup_completed are intended to be abstract but are missing the @abstractmethod decorator. Additionally, they should raise NotImplementedError() instead of returning NotImplemented.

Apply this diff to make the methods abstract and properly signal their intent:

+    @abstractmethod
     def get_by_snapshot(self, volume_name, snapshot_name):
-        return NotImplemented
+        raise NotImplementedError()

+    @abstractmethod
     def wait_for_backup_completed(self, volume_name, snapshot_name):
-        return NotImplemented
+        raise NotImplementedError()

Also applies to: 33-34

e2e/keywords/persistentvolumeclaim.resource (1)

24-26: Ensure consistent naming by using 'claim' instead of 'volume' in claim name generation

In the keyword Delete persistentvolumeclaim for volume ${volume_id}, the claim name is generated using volume as the kind. For consistency with other keywords and to avoid confusion, it should use claim.

Apply this diff to adjust the kind used in the claim name:

     Delete persistentvolumeclaim for volume ${volume_id}
-        ${claim_name} =    generate_name_with_suffix    volume    ${volume_id}
+        ${claim_name} =    generate_name_with_suffix    claim    ${volume_id}
         delete_persistentvolumeclaim    ${claim_name}
e2e/libs/persistentvolumeclaim/crd.py (2)

Line range hint 108-109: Enhance error message for expansion failures

Consider making the error message more specific to help with debugging.

-        assert False, f"Failed to expand {claim_name} size to {size}"
+        assert False, f"Failed to expand PVC {claim_name} from {current_size} to {size} after {retry_count} attempts"

Line range hint 87-109: Consider additional test scenarios

To comprehensively test PVC expansion behavior, consider adding test cases for:

  1. Expanding to exactly the maximum size
  2. Expanding with decimal values
  3. Expanding with different size units (Ki, Mi, Gi, Ti)
  4. Concurrent expansion requests

This will help ensure robust handling of all edge cases.

e2e/libs/keywords/volume_keywords.py (2)

2-2: Remove unused import

The json module is imported but never used in this file.

-import json
🧰 Tools
🪛 Ruff (0.8.0)

2-2: json imported but unused

Remove unused import: json

(F401)


343-360: Consider adding unit tests

The new methods are crucial for testing PVC expansion limits. Consider adding unit tests to verify their behavior, especially edge cases like:

  • Non-existent volumes
  • Volumes without replicas
  • Replicas not scheduled to any disk

Would you like me to help generate unit tests for these methods?

🧰 Tools
🪛 Ruff (0.8.0)

360-360: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

e2e/libs/keywords/workload_keywords.py (1)

163-169: Consider adding size validation checks

While the current changes provide the foundation for testing PVC expansion beyond maximum size, consider enhancing the method to optionally validate if the requested size exceeds available storage capacity. This would align with the linked issue's goal of preventing volume resize stuck situations.

Would you like me to propose an implementation that includes size validation checks?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 39e61f5 and f8ae949.

📒 Files selected for processing (23)
  • e2e/keywords/persistentvolumeclaim.resource (2 hunks)
  • e2e/keywords/statefulset.resource (0 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/backup/backup.py (1 hunks)
  • e2e/libs/backup/base.py (1 hunks)
  • e2e/libs/backup/rest.py (1 hunks)
  • e2e/libs/backupstore/base.py (1 hunks)
  • e2e/libs/backupstore/nfs.py (1 hunks)
  • e2e/libs/backupstore/s3.py (1 hunks)
  • e2e/libs/engine_image/engine_image.py (1 hunks)
  • e2e/libs/host/aws.py (1 hunks)
  • e2e/libs/host/harvester.py (1 hunks)
  • e2e/libs/k8s/k8s.py (1 hunks)
  • e2e/libs/keywords/common_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (3 hunks)
  • e2e/libs/keywords/workload_keywords.py (2 hunks)
  • e2e/libs/node/node.py (2 hunks)
  • e2e/libs/persistentvolumeclaim/crd.py (1 hunks)
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (4 hunks)
  • e2e/libs/utility/utility.py (5 hunks)
  • e2e/libs/volume/crd.py (0 hunks)
  • e2e/tests/regression/test_persistentvolumeclaim.robot (1 hunks)
💤 Files with no reviewable changes (2)
  • e2e/keywords/statefulset.resource
  • e2e/libs/volume/crd.py
✅ Files skipped from review due to trivial changes (5)
  • e2e/libs/backupstore/base.py
  • e2e/libs/backupstore/nfs.py
  • e2e/libs/host/harvester.py
  • e2e/libs/engine_image/engine_image.py
  • e2e/libs/k8s/k8s.py
🧰 Additional context used
🪛 Ruff (0.8.0)
e2e/libs/keywords/volume_keywords.py

2-2: json imported but unused

Remove unused import: json

(F401)


360-360: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

e2e/libs/utility/utility.py

355-355: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

e2e/libs/keywords/workload_keywords.py

33-33: Redefinition of unused convert_size_to_bytes from line 31

Remove definition: convert_size_to_bytes

(F811)

🔇 Additional comments (18)
e2e/libs/host/aws.py (1)

3-6: LGTM! Clean import organization

The import statements are well-organized and the addition of NODE_REBOOT_DOWN_TIME_SECOND from constants improves code maintainability.

e2e/libs/keywords/persistentvolumeclaim_keywords.py (2)

23-25: LGTM!

The create_persistentvolumeclaim method correctly includes the storage_size parameter and properly passes it to the creation function.


38-40: LGTM!

The get_claim_requested_size method accurately retrieves the requested storage size from the claim's specification.

e2e/tests/regression/test_persistentvolumeclaim.robot (1)

22-44: LGTM!

The test case effectively verifies that a PersistentVolumeClaim cannot be expanded beyond its maximum storage size and that data integrity is maintained throughout the process.

e2e/libs/backup/backup.py (1)

4-4: Clarify the necessity of changing the default strategy

The change from CRD to REST strategy seems unrelated to the PR's objective of testing PVC expansion. Could you explain why this change is necessary?

Also applies to: 9-9

e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (1)

27-29: LGTM: Storage size parameter addition

The addition of the storage_size parameter with size conversion is well implemented.

Also applies to: 44-46

e2e/libs/persistentvolumeclaim/crd.py (1)

87-89: LGTM: Skip retry implementation

The skip_retry parameter is well implemented, allowing tests to verify immediate failure scenarios.

e2e/libs/node/node.py (2)

8-8: LGTM!

The import of LONGHORN_NAMESPACE constant is appropriate for the namespace-specific node handling logic.


85-88: LGTM! The namespace-specific node retrieval logic is well-implemented.

The conditional logic appropriately handles different node types based on the namespace:

  • Returns a Longhorn client node for the Longhorn namespace
  • Falls back to Kubernetes core API node for other namespaces
e2e/libs/backupstore/s3.py (1)

6-11: LGTM! Import statements are well-organized.

The imports follow Python's recommended organization:

  1. Standard library imports
  2. Third-party imports (minio)
  3. Local application imports (backupstore.base)

This organization improves code readability and maintainability.

e2e/libs/backup/rest.py (3)

1-11: LGTM! Import statements are well-organized and complete.

The imports are properly organized and include all necessary modules for the enhanced error handling and logging functionality.


Line range hint 85-107: Excellent error handling and logging improvements.

The enhanced error handling in get_from_list now:

  • Catches specific KeyError exceptions
  • Provides detailed error messages
  • Includes backup name in error logs
  • Handles unexpected errors gracefully

This improvement aids in debugging and troubleshooting backup-related issues.


Line range hint 108-139: Well-implemented logging for backup operations.

The additional logging statements in get_by_snapshot and get_backup_volume provide better visibility into the backup process:

  • Clear progress indicators
  • Detailed error messages
  • Helpful context information

This enhancement makes it easier to track and debug backup operations.

e2e/libs/keywords/volume_keywords.py (1)

343-345: LGTM!

The method correctly retrieves the volume size from the volume spec, which will be useful for PVC expansion testing.

e2e/keywords/workload.resource (3)

217-222: LGTM! Volume expansion keyword implementation.

The implementation correctly:

  1. Generates workload name
  2. Converts size to bytes
  3. Calls expand_workload_claim_size with proper parameters

223-230: LGTM! Test case for expansion beyond max size.

The implementation properly:

  1. Gets the maximum storage size from the node
  2. Attempts expansion to max_size + 1
  3. Expects and handles the failure case
  4. Uses skip_retry=True to avoid unnecessary retries

232-240: LGTM! Volume size assertion implementation.

The implementation correctly:

  1. Verifies size remains constant for the specified period
  2. Converts expected size to bytes for comparison
  3. Uses proper sleep interval for polling
e2e/libs/keywords/workload_keywords.py (1)

163-169: LGTM! Method changes align well with testing PVC expansion limits

The refactored method provides better support for testing PVC expansion scenarios, including:

  • More precise size control using bytes
  • Current size retrieval for validation
  • Skip retry option for testing failure cases

These changes align perfectly with the PR objective of testing PVC expansion beyond maximum storage size.

Let's verify the method's usage in test cases:

✅ Verification successful

Method changes are properly integrated and used correctly in test cases

The verification shows that:

  • The method is used correctly in test scenarios in e2e/keywords/workload.resource
  • It's used for both normal expansion cases and negative test cases (testing failure scenarios)
  • All usages have been updated to work with bytes instead of MiB
  • The new skip_retry parameter is properly utilized in failure test cases
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find test cases using this method to ensure proper migration
# from the old method name and parameter types

# Search for both old and new method names
echo "Searching for method usage patterns..."
rg -A 2 "expand_workload_claim_size(_by_mib)?\s*\("

# Look for any remaining MiB size calculations that might need updating
echo "Checking for MiB calculations that might need updating..."
rg "MEBIBYTE.*expand_workload_claim"

Length of output: 723


Script:

#!/bin/bash
# Let's search for any usage of this method in test files
echo "Searching for test files using expand_workload_claim_size..."
rg -l "expand_workload_claim_size" e2e/test_suites/

# Let's also check for any references to the old method name
echo "Checking for any remaining references to old method name..."
rg "expand_workload_claim_size_by_mib" 

# Let's look for any test files modified in this PR that might use this method
echo "Checking modified test files in this PR..."
git diff --name-only HEAD~1 | grep "test_.*\.py$"

Length of output: 554


Script:

#!/bin/bash
# Let's first find the correct test directory structure
echo "Finding test directories..."
fd -t d "test"

# Then search for any usage of this method in the entire codebase
echo "Searching for all usages of expand_workload_claim_size..."
rg -B 2 -A 2 "expand_workload_claim_size"

# Look for any git changes related to this method
echo "Checking git changes related to this method..."
git log -p --all | rg "expand_workload_claim_size"

Length of output: 2042

e2e/libs/keywords/common_keywords.py Show resolved Hide resolved
e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py Outdated Show resolved Hide resolved
e2e/libs/keywords/volume_keywords.py Outdated Show resolved Hide resolved
e2e/libs/utility/utility.py Outdated Show resolved Hide resolved
e2e/libs/keywords/workload_keywords.py Show resolved Hide resolved
@c3y1huang c3y1huang force-pushed the 6633-prevent-resize-stuck branch from f8ae949 to 07a8ed9 Compare December 6, 2024 00:33
@c3y1huang
Copy link
Collaborator Author

c3y1huang commented Dec 6, 2024

@coderabbitai review

Copy link

coderabbitai bot commented Dec 6, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
e2e/keywords/workload.resource (2)

232-240: Consider improvements to size assertion implementation

The size assertion implementation could be enhanced:

  1. Consider making the sleep interval configurable rather than hardcoded to 1 second
  2. Add validation for the input parameters (period should be positive integer)
  3. Consider using Robot Framework's built-in timing keywords

Here's a suggested improvement:

 Assert volume size of ${workload_kind} ${workload_id} remains ${size} for at least ${period} seconds
     ${workload_name} =   generate_name_with_suffix    ${workload_kind}    ${workload_id}
     ${volume_name} =    get_workload_volume_name    ${workload_name}
+    # Validate input parameters
+    ${period} =    Convert To Integer    ${period}
+    Should Be True    ${period} > 0    Period must be a positive integer
+    
+    # Use configurable interval (default 1 second)
+    ${interval} =    Set Variable    ${INTERVAL}    # Define INTERVAL in variables section
     FOR    ${i}    IN RANGE    ${period}
         ${expected_size_byte} =    convert_size_to_bytes    ${size}    to_str=True
         ${current_size_byte} =    get_volume_size    ${volume_name}
         Should Be Equal    ${current_size_byte}    ${expected_size_byte}
-        Sleep    1
+        Sleep    ${interval}
     END

217-240: Add documentation for new keywords

Please add documentation strings for the new keywords to describe:

  • Purpose of each keyword
  • Expected parameters and their formats
  • Expected behavior and failure conditions

Example:

+[Documentation]    Expands the volume of specified workload by the given size.
+...               Fails if the expansion is not possible.
+...               
+...               Arguments:
+...               - workload_kind: Type of the workload (e.g., pod, deployment)
+...               - workload_id: Identifier of the workload
+...               - size: Size to expand to (e.g., "1Gi", "500Mi")
 Expand ${workload_kind} ${workload_id} volume by ${size}
     ${workload_name} =   generate_name_with_suffix    ${workload_kind}    ${workload_id}
     ${new_size} =    convert_size_to_bytes    ${size}
e2e/libs/keywords/workload_keywords.py (1)

161-167: Add parameter documentation and type hints

The implementation looks good and aligns with the PR objectives for testing PVC expansion beyond limits. Consider adding:

  1. Type hints for parameters
  2. Documentation for the skip_retry parameter
-    def expand_workload_claim_size(self, workload_name, size_in_byte, claim_index=0, skip_retry=False):
+    def expand_workload_claim_size(
+        self,
+        workload_name: str,
+        size_in_byte: int,
+        claim_index: int = 0,
+        skip_retry: bool = False
+    ) -> None:
+        """Expand the PVC size for a workload.
+        
+        Args:
+            workload_name: Name of the workload
+            size_in_byte: Target size in bytes
+            claim_index: Index of the PVC to expand
+            skip_retry: If True, doesn't retry on expansion failure
+        """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f8ae949 and 07a8ed9.

📒 Files selected for processing (12)
  • e2e/keywords/persistentvolumeclaim.resource (2 hunks)
  • e2e/keywords/statefulset.resource (0 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/keywords/common_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (3 hunks)
  • e2e/libs/keywords/workload_keywords.py (2 hunks)
  • e2e/libs/node/node.py (2 hunks)
  • e2e/libs/persistentvolumeclaim/crd.py (1 hunks)
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (4 hunks)
  • e2e/libs/utility/utility.py (5 hunks)
  • e2e/tests/regression/test_persistentvolumeclaim.robot (1 hunks)
💤 Files with no reviewable changes (1)
  • e2e/keywords/statefulset.resource
🚧 Files skipped from review as they are similar to previous changes (6)
  • e2e/libs/node/node.py
  • e2e/libs/persistentvolumeclaim/crd.py
  • e2e/libs/keywords/common_keywords.py
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py
  • e2e/tests/regression/test_persistentvolumeclaim.robot
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py
👮 Files not reviewed due to content moderation or server errors (3)
  • e2e/keywords/persistentvolumeclaim.resource
  • e2e/libs/utility/utility.py
  • e2e/libs/keywords/volume_keywords.py
🧰 Additional context used
🪛 Ruff (0.8.0)
e2e/libs/keywords/volume_keywords.py

2-2: json imported but unused

Remove unused import: json

(F401)

🔇 Additional comments (4)
e2e/keywords/workload.resource (2)

201-204: LGTM: Formatting improvements

The spacing adjustments in variable assignments improve code readability while maintaining the same functionality.


217-221: LGTM: Volume expansion keywords implement required test scenarios

The implementation correctly handles both normal expansion and testing of expansion beyond maximum size. The use of skip_retry=True ensures proper failure handling.

Let's verify the implementation of the called functions:

Also applies to: 223-230

✅ Verification successful

Volume expansion implementation is correctly integrated with required dependencies

The verification confirms that all required functions are properly implemented:

  • convert_size_to_bytes handles size conversion with GiB units
  • expand_workload_claim_size retrieves current size and handles expansion
  • get_volume_node_disk_storage_maximum correctly fetches storage limits from node disks
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of volume expansion related functions

# Check for the implementation of convert_size_to_bytes
rg -A 5 "def convert_size_to_bytes"

# Check for the implementation of expand_workload_claim_size
rg -A 5 "def expand_workload_claim_size"

# Check for the implementation of get_volume_node_disk_storage_maximum
rg -A 5 "def get_volume_node_disk_storage_maximum"

Length of output: 2062

e2e/libs/keywords/workload_keywords.py (2)

31-31: LGTM! Import statement is correctly placed

The import of convert_size_to_bytes is properly organized and not duplicated anymore.


166-167: Verify error handling in the PersistentVolumeClaim expand method

Since this method is specifically for testing PVC expansion beyond limits, we should verify that the underlying expand method properly handles expansion failures.

✅ Verification successful

Let me generate additional verification scripts to check the error handling in the PersistentVolumeClaim expand method.


Let me search for the PersistentVolumeClaim class implementation since it wasn't found in the previous search.


Let me search for the CRD class implementation since it appears to handle the actual expansion operation.


Based on the search results, I can now generate the final response regarding the error handling in the PersistentVolumeClaim expand method.

Error handling for PVC expansion is properly implemented

The PersistentVolumeClaim expansion implementation includes proper error handling through:

  1. A retry mechanism with configurable retry count and interval
  2. Proper error propagation from the underlying Kubernetes API calls
  3. Skip retry option for testing scenarios that need to verify immediate failures

The implementation in the CRD class handles expansion errors by:

def expand(self, claim_name, size, namespace="default", skip_retry=False):
    retry_count = 1 if skip_retry else self.retry_count
    for i in range(retry_count):
        logging(f"Trying to expand pvc {claim_name} to size {size} ... ({i})")
        try:
            self.core_v1_api.patch_namespaced_persistent_volume_claim(...)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check error handling in PersistentVolumeClaim expand method
# Look for error handling patterns in the expand method

# Search for the expand method implementation
ast-grep --pattern 'def expand($_, $_, skip_retry=$_) {
  $$$
}'

# Search for error handling patterns
rg -A 5 'raise|except|error|fail' --type py

Length of output: 74112


Script:

#!/bin/bash
# Search for the PersistentVolumeClaim class and expand method implementation
ast-grep --pattern 'class PersistentVolumeClaim {
  $$$
  expand($_, $_, $_) {
    $$$
  }
  $$$
}'

# Search for error handling patterns specifically in workload_keywords.py and related files
rg -A 5 'try|except|raise|error|fail' --type py e2e/libs/keywords/workload_keywords.py e2e/libs/keywords/persistentvolumeclaim.py

Length of output: 2075


Script:

#!/bin/bash
# Find the PersistentVolumeClaim class implementation
fd -t f -e py persistentvolumeclaim --exec rg -A 10 'class PersistentVolumeClaim' {}

# Search for expand method implementation
fd -t f -e py persistentvolumeclaim --exec rg -A 10 'def expand.*skip_retry' {}

# Look for error handling in the found files
fd -t f -e py persistentvolumeclaim --exec rg -A 5 'try|except|raise|error|fail' {}

Length of output: 2334


Script:

#!/bin/bash
# Find the CRD class implementation and its expand method
fd -t f -e py -e yaml -e json . --exec rg -A 10 'class CRD.*:' {}

# Look for expand method in CRD class
fd -t f -e py . --exec rg -A 10 'def expand.*skip_retry' {}

# Search for error handling in CRD operations
fd -t f -e py . --exec rg -A 5 'try|except|raise|error|fail.*CRD' {}

# Search for PVC expansion error handling patterns
rg -A 5 'try|except|raise|error|fail.*expand' --type py

Length of output: 69963

@c3y1huang c3y1huang marked this pull request as ready for review December 6, 2024 00:42
@c3y1huang c3y1huang requested a review from a team as a code owner December 6, 2024 00:42
Comment on lines 22 to 45
Test persistentvolumeclaim expand more than storage maximum size should fail
[Tags] volume expansion
[Documentation] Verify that a PersistentVolumeClaim cannot be expanded beyond
... the storage maximum size.
...
... Issue: https://github.com/longhorn/longhorn/issues/6633

Given Set setting storage-over-provisioning-percentage to 100
And Create persistentvolumeclaim 0 using RWX volume storage_size=2GiB
And Create deployment 0 with persistentvolumeclaim 0
And Write 10 MB data to file data.txt in deployment 0

FOR ${i} IN RANGE ${LOOP_COUNT}
When Expand deployment 0 volume more than storage maximum size should fail
Then Assert volume size of deployment 0 remains 2GiB for at least 5 seconds
And Assert persistentvolumeclaim 0 requested size remains 2GiB for at least 5 seconds
And Check deployment 0 data in file data.txt is intact

When Expand deployment 0 volume by 1 GiB
And Assert persistentvolumeclaim 0 requested size remains 3GiB for at least 5 seconds
And Assert volume size of deployment 0 remains 3GiB for at least 5 seconds
And Check deployment 0 data in file data.txt is intact
END
Copy link
Collaborator Author

@c3y1huang c3y1huang Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @WebberHuang1118 for expected behavior after fix.

@c3y1huang c3y1huang force-pushed the 6633-prevent-resize-stuck branch from 07a8ed9 to f3021f0 Compare December 6, 2024 00:45
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
e2e/keywords/workload.resource (2)

217-221: Consider adding documentation for the new keyword

The implementation looks good and aligns with the PR objectives. Consider adding documentation to describe the expected format of the size parameter (e.g., "10Gi", "1000Mi").


232-240: Consider adding custom error messages

The size assertion implementation is solid. Consider enhancing the error message in Should Be Equal to include both the current and expected sizes for better debugging:

-        Should Be Equal    ${current_size_byte}    ${expected_size_byte}
+        Should Be Equal    ${current_size_byte}    ${expected_size_byte}    
+        ...    msg=Volume size mismatch after ${i}s: expected=${expected_size_byte}, got=${current_size_byte}
e2e/libs/keywords/volume_keywords.py (1)

2-2: Remove unused import

The json module is imported but never used in this file.

-import json
🧰 Tools
🪛 Ruff (0.8.0)

2-2: json imported but unused

Remove unused import: json

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 07a8ed9 and f3021f0.

📒 Files selected for processing (7)
  • e2e/keywords/persistentvolumeclaim.resource (2 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/keywords/common_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (3 hunks)
  • e2e/libs/node/node.py (2 hunks)
  • e2e/tests/regression/test_persistentvolumeclaim.robot (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • e2e/tests/regression/test_persistentvolumeclaim.robot
  • e2e/libs/node/node.py
  • e2e/libs/keywords/common_keywords.py
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py
  • e2e/keywords/persistentvolumeclaim.resource
🧰 Additional context used
🪛 Ruff (0.8.0)
e2e/libs/keywords/volume_keywords.py

2-2: json imported but unused

Remove unused import: json

(F401)

🔇 Additional comments (4)
e2e/keywords/workload.resource (2)

201-204: LGTM: Formatting improvements

The whitespace adjustments improve code readability while maintaining the same functionality.


223-230: Verify specific error message content

The implementation correctly tests PVC expansion beyond maximum size. Consider enhancing the error message validation to ensure it contains specific details about the maximum size limit.

✅ Verification successful

Error message validation is appropriate as implemented

The current implementation using Failed to expand* as the expected error pattern follows a good practice by:

  1. Using a wildcard (*) which allows for dynamic content after the base message
  2. Focusing on the essential error condition without over-constraining the exact message

Since we couldn't find other error message patterns in the codebase for comparison, the current implementation appears to be appropriate and sufficient for the test case.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify error message format in similar test cases
rg -l 'Failed to expand' --type robot

Length of output: 39


Script:

#!/bin/bash
# Let's try a broader search for error message patterns in robot files
rg -A 3 'Run Keyword And Expect Error.*expand' --type robot

Length of output: 61


Script:

#!/bin/bash
# Let's search for any error message patterns in robot files to understand the convention
rg -A 2 'Run Keyword And Expect Error' --type robot

Length of output: 53

e2e/libs/keywords/volume_keywords.py (2)

343-345: LGTM!

The implementation correctly retrieves the volume size from the volume spec.


347-360: 🛠️ Refactor suggestion

Several improvements needed for robustness

The method implementation could be enhanced for better error handling and logging:

  1. Add length check for replica_list before accessing first element
  2. Use a more specific exception type
  3. Add logging when starting the search
  4. Improve error message clarity

Consider applying these improvements:

 def get_volume_node_disk_storage_maximum(self, volume_name, node_name):
+    logging(f"Searching for maximum storage capacity for volume {volume_name} on node {node_name}")
     replica_list = self.replica.get(volume_name, node_name)
+    if not replica_list:
+        raise ValueError(f"No replicas found for volume {volume_name} on node {node_name}")
     replica = replica_list[0]
     replica_name = replica['metadata']['name']
     node = self.node.get_node_by_name(node_name, namespace=LONGHORN_NAMESPACE)
     for diskName in node.disks:
         disk = node.disks[diskName]

         for scheduledReplica in disk['scheduledReplica']:
             if scheduledReplica == replica_name:
                 logging(f"Found replica {scheduledReplica} on node {node_name} scheduled to disk {diskName}")
                 return disk['storageMaximum']

-    raise Exception(f"Failed to find storageMaximum for volume {volume_name} replica {replica_name} on node {node_name}")
+    raise ValueError(
+        f"Failed to find storageMaximum for volume {volume_name} replica {replica_name} on node {node_name}. "
+        f"This could indicate that the replica is not scheduled to any disk."
+    )

Likely invalid or redundant comment.

Copy link

mergify bot commented Dec 6, 2024

This pull request is now in conflict. Could you fix it @c3y1huang? 🙏

@c3y1huang c3y1huang force-pushed the 6633-prevent-resize-stuck branch from f3021f0 to ed9c48e Compare December 6, 2024 03:09
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (8)
e2e/tests/regression/test_persistentvolumeclaim.robot (1)

29-44: LGTM: Comprehensive test implementation with a suggestion

The test implementation thoroughly covers both failure and success scenarios for volume expansion, including data integrity verification. However, consider increasing the ${LOOP_COUNT} from 1 to a higher value (e.g., 3) to better stress test the functionality and ensure consistent behavior across multiple attempts.

-${LOOP_COUNT}    1
+${LOOP_COUNT}    3
e2e/keywords/workload.resource (1)

227-250: LGTM: Well-implemented keywords with a suggestion

The new keywords provide a robust implementation for testing volume expansion scenarios. The implementation includes proper size conversion, error handling, and continuous size verification.

Consider making the error message verification more specific to ensure we're catching the exact expected error:

-    Run Keyword And Expect Error    Failed to expand*    expand_workload_claim_size    ${workload_name}    ${new_size}    skip_retry=True
+    Run Keyword And Expect Error    Failed to expand volume: requested size * exceeds available storage*    expand_workload_claim_size    ${workload_name}    ${new_size}    skip_retry=True
e2e/libs/utility/utility.py (2)

Line range hint 179-187: Add retry logic to filter_cr function

The function should include retry logic similar to get_all_crs to handle transient API errors and maintain consistency with other functions in the module.

 def filter_cr(group, version, namespace, plural, field_selector="", label_selector=""):
     api = client.CustomObjectsApi()
+    retry_count, retry_interval = get_retry_count_and_interval()
+    for _ in range(retry_count):
+        try:
+            resp = api.list_namespaced_custom_object(
+                group, version, namespace, plural,
+                field_selector=field_selector,
+                label_selector=label_selector)
+            return resp
+        except ApiException as e:
+            logging(f"Listing namespaced custom object error: {e}")
+        time.sleep(retry_interval)
-    try:
-        resp = api.list_namespaced_custom_object(group, version, namespace, plural, field_selector=field_selector, label_selector=label_selector)
-        return resp
-    except ApiException as e:
-        logging(f"Listing namespaced custom object: {e}")
+    raise ApiException("Failed to list namespaced custom objects after retries")

343-355: Consider adding support for more size units

The implementation is good and follows best practices. Consider enhancing it with:

  1. Support for additional units (TiB, KiB)
  2. Input validation for negative numbers
  3. Support for decimal values
 def convert_size_to_bytes(size):
     size = size.replace(" ", "")
 
+    if not size:
+        raise ValueError("Size cannot be empty")
+
+    # Extract numeric part and validate
+    numeric_part = size[:-3] if size.endswith(("GiB", "MiB", "TiB", "KiB")) else size
+    try:
+        value = float(numeric_part)
+        if value < 0:
+            raise ValueError("Size cannot be negative")
+    except ValueError:
+        raise ValueError(f"Invalid numeric value in size: {size}")
+
+    if size.endswith("TiB"):
+        return int(value * 1024 * 1024 * 1024 * 1024)
     if size.endswith("GiB"):
-        return int(size[:-3]) * 1024 * 1024 * 1024
+        return int(value * 1024 * 1024 * 1024)
     if size.endswith("MiB"):
-        return int(size[:-3]) * 1024 * 1024
+        return int(value * 1024 * 1024)
+    if size.endswith("KiB"):
+        return int(value * 1024)
 
     if size.isdigit():
         return int(size)
 
     raise ValueError(f"Invalid size format: {size}")
e2e/libs/backupstore/s3.py (3)

Line range hint 31-52: Security and resource management improvements needed in certificate handling.

Several issues need to be addressed in the certificate handling:

  1. Writing to current working directory may fail due to permissions
  2. The certificate file is not cleaned up after use
  3. Missing error handling for file operations

Consider applying these improvements:

     def get_api_client(self, minio_secret_name):
         secret = self.core_api.read_namespaced_secret(name=minio_secret_name,
                                                       namespace='longhorn-system')
         # ... existing code ...
-        minio_cert_file_path = os.path.join(os.getcwd(), "minio_cert.crt")
-        with open(minio_cert_file_path, 'w') as minio_cert_file:
-            base64_minio_cert = \
-                base64.b64decode(base64_minio_cert).decode("utf-8")
-            minio_cert_file.write(base64_minio_cert)
-
-        os.environ["SSL_CERT_FILE"] = minio_cert_file_path
+        try:
+            cert_dir = tempfile.mkdtemp()
+            minio_cert_file_path = os.path.join(cert_dir, "minio_cert.crt")
+            with open(minio_cert_file_path, 'w') as minio_cert_file:
+                base64_minio_cert = \
+                    base64.b64decode(base64_minio_cert).decode("utf-8")
+                minio_cert_file.write(base64_minio_cert)
+
+            os.environ["SSL_CERT_FILE"] = minio_cert_file_path
+            # ... rest of the method ...
+        finally:
+            if os.path.exists(cert_dir):
+                os.remove(minio_cert_file_path)
+                os.rmdir(cert_dir)

Line range hint 54-77: Add error handling and input validation for URL parsing.

The URL parsing methods could be more robust with proper error handling and input validation.

Consider these improvements:

     def get_backupstore_bucket_name(self):
         backupstore = self.backup_target
+        if not backupstore:
+            raise ValueError("Backup target URL is empty")
         assert self.is_backupTarget_s3(backupstore)
-        bucket_name = urlparse(backupstore).netloc.split('@')[0]
+        try:
+            parsed_url = urlparse(backupstore)
+            if not parsed_url.netloc:
+                raise ValueError("Invalid backup target URL format")
+            bucket_name = parsed_url.netloc.split('@')[0]
+            return bucket_name
+        except Exception as e:
+            raise ValueError(f"Failed to parse backup target URL: {str(e)}")
-        return bucket_name

Line range hint 79-199: Improve process management and error handling in file operations.

Several operational improvements are needed:

  1. Port forwarding process cleanup in error cases
  2. Consistent error handling
  3. Proper logging instead of print statements

Consider applying these improvements:

     def create_file_in_backupstore(self, file_path, data={}):
-        process = self.port_forward()
+        process = None
+        try:
+            process = self.port_forward()
+            secret_name = self.secret
+            if not secret_name:
+                raise ValueError("Secret name is required")
 
-        secret_name = self.secret
+            minio_api = self.get_api_client(secret_name)
+            bucket_name = self.get_backupstore_bucket_name()
 
-        minio_api = self.get_api_client(secret_name)
-        bucket_name = self.get_backupstore_bucket_name()
+            if len(data) == 0:
+                data = {"testkey": "test data from create_file_in_backupstore()"}
 
-        if len(data) == 0:
-            data = {"testkey": "test data from create_file_in_backupstore()"}
+            with tempfile.NamedTemporaryFile('w') as fp:
+                json.dump(data, fp)
+                fp.flush()
+                with open(fp.name, mode='rb') as f:
+                    temp_file_stat = os.stat(fp.name)
+                    minio_api.put_object(bucket_name,
+                                       file_path,
+                                       f,
+                                       temp_file_stat.st_size)
+                    read_back = minio_api.get_object(bucket_name,
+                                                   file_path)
+                    assert read_back.data.decode("utf-8") == json.dumps(data)
+                    logging(f"Created file {file_path} in backupstore")
+        except ResponseError as err:
+            logging(f"Error in create_file_in_backupstore: {str(err)}")
+            raise
+        except Exception as e:
+            logging(f"Unexpected error in create_file_in_backupstore: {str(e)}")
+            raise
+        finally:
+            if process:
+                process.kill()

Similar improvements should be applied to other file operation methods for consistency.

e2e/libs/keywords/volume_keywords.py (1)

2-2: Remove unused import

The json module is imported but never used in the code.

-import json
🧰 Tools
🪛 Ruff (0.8.0)

2-2: json imported but unused

Remove unused import: json

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f3021f0 and ed9c48e.

📒 Files selected for processing (23)
  • e2e/keywords/persistentvolumeclaim.resource (2 hunks)
  • e2e/keywords/statefulset.resource (0 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/backup/backup.py (1 hunks)
  • e2e/libs/backup/base.py (1 hunks)
  • e2e/libs/backup/rest.py (1 hunks)
  • e2e/libs/backupstore/base.py (1 hunks)
  • e2e/libs/backupstore/nfs.py (1 hunks)
  • e2e/libs/backupstore/s3.py (1 hunks)
  • e2e/libs/engine_image/engine_image.py (1 hunks)
  • e2e/libs/host/aws.py (1 hunks)
  • e2e/libs/host/harvester.py (1 hunks)
  • e2e/libs/k8s/k8s.py (1 hunks)
  • e2e/libs/keywords/common_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (3 hunks)
  • e2e/libs/keywords/workload_keywords.py (2 hunks)
  • e2e/libs/node/node.py (2 hunks)
  • e2e/libs/persistentvolumeclaim/crd.py (1 hunks)
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (4 hunks)
  • e2e/libs/utility/utility.py (5 hunks)
  • e2e/libs/volume/crd.py (0 hunks)
  • e2e/tests/regression/test_persistentvolumeclaim.robot (1 hunks)
💤 Files with no reviewable changes (2)
  • e2e/libs/volume/crd.py
  • e2e/keywords/statefulset.resource
🚧 Files skipped from review as they are similar to previous changes (16)
  • e2e/libs/backupstore/base.py
  • e2e/libs/backupstore/nfs.py
  • e2e/libs/keywords/common_keywords.py
  • e2e/libs/engine_image/engine_image.py
  • e2e/libs/persistentvolumeclaim/crd.py
  • e2e/libs/backup/backup.py
  • e2e/keywords/persistentvolumeclaim.resource
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py
  • e2e/libs/host/harvester.py
  • e2e/libs/k8s/k8s.py
  • e2e/libs/backup/rest.py
  • e2e/libs/node/node.py
  • e2e/libs/host/aws.py
  • e2e/libs/keywords/workload_keywords.py
  • e2e/libs/backup/base.py
🧰 Additional context used
🪛 Ruff (0.8.0)
e2e/libs/keywords/volume_keywords.py

2-2: json imported but unused

Remove unused import: json

(F401)

🔇 Additional comments (10)
e2e/tests/regression/test_persistentvolumeclaim.robot (3)

1-11: LGTM: Test setup and resources are well-structured

The test file has proper documentation, regression tagging, and includes all necessary resources for PVC, deployment, and workload management.


12-19: LGTM: Test configuration is appropriate

The setup/teardown procedures are in place, and the retry parameters (300 retries with 1-second intervals) are suitable for volume operations that may take time to complete.


22-28: LGTM: Test documentation is clear and well-linked

The test case documentation clearly states its purpose and links to the relevant issue (#6633), providing good traceability.

e2e/keywords/workload.resource (1)

202-205: Formatting changes only

These are purely formatting changes that don't affect functionality.

e2e/libs/utility/utility.py (2)

Line range hint 188-202: Replace assertion with explicit exception

Replace assert False with a proper exception to maintain consistency with error handling patterns and prevent issues when running Python with -O flag.

     for i in range(retry_count):
         try:
             resp = api.list_namespaced_pod(
                 namespace=namespace,
                 label_selector=label_selector)
             return resp.items
         except Exception as e:
             logging(f"Failed to list namespaced {namespace} pods with error: {e}")
         time.sleep(retry_interval)
-    assert False, f"Failed to list namespaced {namespace} pods"
+    raise ApiException(f"Failed to list namespaced {namespace} pods after {retry_count} retries")

Line range hint 203-231: LGTM! Well-implemented conflict handling

The addition of conflict error handling with proper retry logic is a good improvement. The implementation correctly handles 409 conflicts while preserving other error cases.

🧰 Tools
🪛 Ruff (0.8.0)

201-201: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

e2e/libs/backupstore/s3.py (1)

1-14: LGTM! Import statements and class definition are well organized.

The imports are logically grouped and the inheritance from Base class is properly defined.

e2e/libs/keywords/volume_keywords.py (3)

350-352: LGTM!

The implementation is clean and correctly retrieves the volume size from its specification.


354-367: 🛠️ Refactor suggestion

Improve robustness and readability

The method could benefit from several improvements:

  1. Add validation for empty replica list
  2. Use specific exceptions instead of bare Exception
  3. Simplify nested loops with early returns
  4. Add safer dictionary access

Consider this improved implementation:

 def get_volume_node_disk_storage_maximum(self, volume_name, node_name):
     replica_list = self.replica.get(volume_name, node_name)
+    if not replica_list:
+        raise ValueError(f"No replicas found for volume {volume_name} on node {node_name}")
+
     replica = replica_list[0]
     replica_name = replica['metadata']['name']
     node = self.node.get_node_by_name(node_name, namespace=LONGHORN_NAMESPACE)
+
     for diskName in node.disks:
-        disk = node.disks[diskName]
+        disk = node.disks.get(diskName, {})
+        scheduled_replicas = disk.get('scheduledReplica', [])
+        if replica_name in scheduled_replicas:
+            logging(f"Found replica {replica_name} on node {node_name} scheduled to disk {diskName}")
+            return disk.get('storageMaximum')
-        for scheduledReplica in disk['scheduledReplica']:
-            if scheduledReplica == replica_name:
-                logging(f"Found replica {scheduledReplica} on node {node_name} scheduled to disk {diskName}")
-                return disk['storageMaximum']

-    raise Exception(f"Failed to find storageMaximum for volume {volume_name} replica {replica_name} on node {node_name}")
+    raise ValueError(
+        f"Failed to find storageMaximum for volume {volume_name} replica {replica_name} on node {node_name}. "
+        "Ensure the replica is properly scheduled to a disk on the specified node."
+    )

Likely invalid or redundant comment.


350-367: Verify test coverage for new methods

The new methods appear to support testing PVC expansion beyond storage maximum size, but let's verify their usage in test cases.

@c3y1huang c3y1huang force-pushed the 6633-prevent-resize-stuck branch from ed9c48e to 7a9098b Compare December 6, 2024 03:15
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (4)
e2e/keywords/persistentvolumeclaim.resource (1)

27-34: Consider parameterizing the sleep duration

The sleep duration is hardcoded to 1 second. Consider making it configurable for different test scenarios.

 Assert persistentvolumeclaim ${claim_id} requested size remains ${size} for at least ${period} seconds
+    [Arguments]    ${sleep_duration}=1
     ${claim_name} =    generate_name_with_suffix    claim    ${claim_id}
     FOR    ${i}    IN RANGE    ${period}
         ${expected_size_byte} =    convert_size_to_bytes    ${size}    to_str=True
         ${current_size_byte} =    get_claim_requested_size    ${claim_name}
         Should Be Equal    ${current_size_byte}    ${expected_size_byte}
-        Sleep    1
+        Sleep    ${sleep_duration}
     END
e2e/keywords/workload.resource (2)

242-249: Consider adding size validation in error message.

The keyword correctly tests expansion failure, but could be more informative.

Consider capturing and validating the error message content:

-    Run Keyword And Expect Error    Failed to expand*    expand_workload_claim_size    ${workload_name}    ${new_size}    skip_retry=True
+    ${error_msg}=    Run Keyword And Expect Error    Failed to expand*    expand_workload_claim_size    ${workload_name}    ${new_size}    skip_retry=True
+    Should Contain    ${error_msg}    exceeds maximum size    msg=Expected error message about exceeding maximum size

251-259: Consider adding tolerance for size comparison.

The size verification is good but might be too strict.

Consider adding a small tolerance when comparing sizes to account for potential rounding differences:

-        Should Be Equal    ${current_size_byte}    ${expected_size_byte}
+        ${diff}=    Evaluate    abs(${current_size_byte} - ${expected_size_byte})
+        Should Be True    ${diff} < 1024    msg=Size difference exceeds 1KB tolerance
e2e/libs/keywords/volume_keywords.py (1)

2-2: Remove unused import.

The json import is not used in this file.

-import json
🧰 Tools
🪛 Ruff (0.8.0)

2-2: json imported but unused

Remove unused import: json

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ed9c48e and 7a9098b.

📒 Files selected for processing (23)
  • e2e/keywords/persistentvolumeclaim.resource (2 hunks)
  • e2e/keywords/statefulset.resource (0 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/backup/backup.py (1 hunks)
  • e2e/libs/backup/base.py (1 hunks)
  • e2e/libs/backup/rest.py (1 hunks)
  • e2e/libs/backupstore/base.py (1 hunks)
  • e2e/libs/backupstore/nfs.py (1 hunks)
  • e2e/libs/backupstore/s3.py (1 hunks)
  • e2e/libs/engine_image/engine_image.py (1 hunks)
  • e2e/libs/host/aws.py (1 hunks)
  • e2e/libs/host/harvester.py (1 hunks)
  • e2e/libs/k8s/k8s.py (1 hunks)
  • e2e/libs/keywords/common_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (3 hunks)
  • e2e/libs/keywords/workload_keywords.py (2 hunks)
  • e2e/libs/node/node.py (2 hunks)
  • e2e/libs/persistentvolumeclaim/crd.py (1 hunks)
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (4 hunks)
  • e2e/libs/utility/utility.py (5 hunks)
  • e2e/libs/volume/crd.py (0 hunks)
  • e2e/tests/regression/test_persistentvolumeclaim.robot (1 hunks)
💤 Files with no reviewable changes (2)
  • e2e/keywords/statefulset.resource
  • e2e/libs/volume/crd.py
🚧 Files skipped from review as they are similar to previous changes (15)
  • e2e/libs/backupstore/base.py
  • e2e/libs/backupstore/nfs.py
  • e2e/libs/keywords/common_keywords.py
  • e2e/libs/node/node.py
  • e2e/libs/persistentvolumeclaim/crd.py
  • e2e/libs/keywords/workload_keywords.py
  • e2e/libs/backup/backup.py
  • e2e/libs/k8s/k8s.py
  • e2e/libs/host/harvester.py
  • e2e/libs/engine_image/engine_image.py
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py
  • e2e/libs/backupstore/s3.py
  • e2e/libs/backup/rest.py
  • e2e/libs/host/aws.py
  • e2e/libs/backup/base.py
👮 Files not reviewed due to content moderation or server errors (1)
  • e2e/libs/utility/utility.py
🧰 Additional context used
🪛 Ruff (0.8.0)
e2e/libs/keywords/volume_keywords.py

2-2: json imported but unused

Remove unused import: json

(F401)

🔇 Additional comments (7)
e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (2)

27-29: LGTM: Clean implementation of storage size parameter

The addition of the storage_size parameter with proper size conversion is well implemented.

Also applies to: 44-45


104-110: ⚠️ Potential issue

Add size validation before expansion

To prevent volume resize stuck issues (longhorn/longhorn#6633), consider adding validation before attempting expansion.

 def expand(self, claim_name, size_in_byte, skip_retry=False):
     pvc = self.claim.get(claim_name)
     current_size = int(pvc.spec.resources.requests['storage'])
     target_size = current_size + size_in_byte
+    # Validate target size against system capacity
+    if target_size > self._get_max_storage_size():
+        raise ValueError(f"Target size {target_size} exceeds maximum storage capacity")
     logging(f"Expanding PVC {claim_name} from {current_size} to {target_size}")
     expanded_size = self.claim.expand(claim_name, target_size, skip_retry=skip_retry)
e2e/keywords/persistentvolumeclaim.resource (1)

11-13: LGTM: Flexible PVC creation with config parameter

The addition of the config parameter allows for more flexible PVC creation scenarios.

e2e/tests/regression/test_persistentvolumeclaim.robot (1)

22-44: LGTM: Well-structured test case with comprehensive verification

The test case effectively validates:

  1. Failure when expanding beyond max size
  2. Success when expanding within limits
  3. Data integrity throughout the process

The BDD style makes the test flow clear and maintainable.

e2e/keywords/workload.resource (1)

236-240: LGTM! Clean implementation of volume expansion.

The keyword correctly converts the size and expands the volume.

e2e/libs/keywords/volume_keywords.py (2)

350-352: LGTM! Clean implementation of volume size retrieval.

The method correctly retrieves the volume size from the spec.


354-367: 🛠️ Refactor suggestion

Add input validation and use specific exceptions.

The method needs better error handling and input validation.

Consider these improvements:

 def get_volume_node_disk_storage_maximum(self, volume_name, node_name):
     replica_list = self.replica.get(volume_name, node_name)
+    if not replica_list:
+        raise ValueError(f"No replicas found for volume {volume_name} on node {node_name}")
     replica = replica_list[0]
     replica_name = replica['metadata']['name']
     node = self.node.get_node_by_name(node_name, namespace=LONGHORN_NAMESPACE)
     for diskName in node.disks:
         disk = node.disks[diskName]

         for scheduledReplica in disk['scheduledReplica']:
             if scheduledReplica == replica_name:
                 logging(f"Found replica {scheduledReplica} on node {node_name} scheduled to disk {diskName}")
                 return disk['storageMaximum']

-    raise Exception(f"Failed to find storageMaximum for volume {volume_name} replica {replica_name} on node {node_name}")
+    raise ValueError(
+        f"Failed to find storageMaximum for volume {volume_name} replica {replica_name} on node {node_name}. "
+        f"Ensure the replica is properly scheduled to a disk on the specified node."
+    )

Likely invalid or redundant comment.

@c3y1huang c3y1huang force-pushed the 6633-prevent-resize-stuck branch from 7a9098b to 20a04e4 Compare December 6, 2024 07:00
@c3y1huang c3y1huang requested a review from roger-ryao December 6, 2024 07:01
@c3y1huang c3y1huang changed the title chore(robot): test pvc expand more than storage maximum size test(robot): expand PVC more than storage maximum size Dec 6, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (10)
e2e/tests/regression/test_persistentvolumeclaim.robot (2)

15-19: Consider documenting the variables.

Add comments explaining the purpose and impact of these variables:

  • LOOP_COUNT: Number of times to repeat the expansion test
  • RETRY_COUNT: Maximum number of retries for operations
  • RETRY_INTERVAL: Time to wait between retries

34-44: Consider adding validation for the initial data write.

The test writes data and verifies it remains intact after volume operations, but doesn't validate the initial write was successful.

Add a verification step after the initial data write:

    And Write 10 MB data to file data.txt in deployment 0
+   And Check deployment 0 data in file data.txt is intact

    FOR    ${i}    IN RANGE    ${LOOP_COUNT}
e2e/libs/utility/utility.py (1)

Line range hint 167-202: Consider adding error handling for empty responses.

The CR management functions should handle cases where the API returns empty responses.

Add null checks:

def get_all_crs(group, version, namespace, plural):
    api = client.CustomObjectsApi()
    retry_count, retry_interval = get_retry_count_and_interval()
    for _ in range(retry_count):
        try:
            resp = api.list_namespaced_custom_object(group, version, namespace, plural)
+           if not resp:
+               logging("Empty response from API")
+               return []
            return resp
e2e/keywords/workload.resource (3)

236-240: Consider adding documentation for the new keyword

The implementation looks good, but adding documentation would help users understand:

  • The expected format of the size parameter
  • Any prerequisites or post-conditions

Add documentation like this:

+[Documentation]    Expands the volume size of the specified workload.
+...               The size parameter should be in format like "1Gi", "500Mi", etc.
 Expand ${workload_kind} ${workload_id} volume by ${size}

242-249: Consider enhancing error message verification

The implementation correctly tests expansion beyond maximum size. However, the error message verification could be more specific.

Consider using a more specific error pattern:

-    Run Keyword And Expect Error    Failed to expand*    expand_workload_claim_size    ${workload_name}    ${new_size}    skip_retry=True
+    Run Keyword And Expect Error    Failed to expand*exceeds maximum storage size*    expand_workload_claim_size    ${workload_name}    ${new_size}    skip_retry=True

251-259: Consider enhancing failure message for better debugging

The implementation provides good verification of size stability. However, the error message could be more informative.

Consider adding more context to the size comparison:

-        Should Be Equal    ${current_size_byte}    ${expected_size_byte}
+        Should Be Equal    ${current_size_byte}    ${expected_size_byte}    msg=Volume ${volume_name} size changed: expected=${expected_size_byte}, got=${current_size_byte}
e2e/libs/keywords/volume_keywords.py (1)

2-2: Remove unused import

The json module is imported but never used in this file.

-import json
🧰 Tools
🪛 Ruff (0.8.0)

2-2: json imported but unused

Remove unused import: json

(F401)

e2e/libs/backup/rest.py (3)

Line range hint 20-43: Consider adding additional backup verification checks.

While the current implementation verifies backup completion and timestamp, consider adding checks for:

  • Backup size validation
  • Backup status verification (e.g., health check)
  • Storage capacity verification (aligning with PR objectives regarding volume resize issues)

Example addition:

 assert volume.lastBackupAt != "", \
     f"expect volume lastBackupAt is not empty, but it's {volume.lastBackupAt}"
+# Verify backup size and storage capacity
+assert backup.size > 0, f"expect backup size greater than 0, but got {backup.size}"
+assert backup.status == "Ready", f"expect backup status Ready, but got {backup.status}"

Line range hint 45-89: Enhance error handling in list method.

The list method's handling of None volume_name could be improved with proper error handling and logging.

Consider this improvement:

 def list(self, volume_name):
     if not volume_name:
         backup_list = []
         vol_list = self.volume.list()
+        logging(f"Listing backups for all volumes: {vol_list}")
         for volume_name in vol_list:
+            try:
                 backup_volume = self.get_backup_volume(volume_name)
                 if backup_volume:
                     backup_list.extend(backup_volume.backupList().data)
+            except Exception as e:
+                logging(f"Error listing backups for volume {volume_name}: {e}")
         return backup_list
     else:
+        logging(f"Listing backups for volume: {volume_name}")
         backup_volume = self.get_backup_volume(volume_name)
+        if not backup_volume:
+            logging(f"No backup volume found for {volume_name}")
+            return []
         return backup_volume.backupList().data

Line range hint 183-219: Add safety checks before cleanup operations.

While the cleanup methods handle errors well, consider adding safety checks to prevent disrupting ongoing operations.

Consider these additions:

 def cleanup_backup_volumes(self):
+    # Check for any ongoing backup operations
+    active_operations = self._check_active_operations()
+    if active_operations:
+        logging(f"Skipping cleanup due to active operations: {active_operations}")
+        return
+
     backup_volumes = get_longhorn_client().list_backup_volume()
     # we delete the whole backup volume, which skips block gc
     for backup_volume in backup_volumes:
         self.delete_backup_volume(backup_volume.name)

Also, consider adding a method to check for active operations:

def _check_active_operations(self):
    """Check for any ongoing backup or restore operations."""
    volumes = self.volume.list()
    active_ops = []
    for vol_name in volumes:
        vol = self.volume.get(vol_name)
        if any(status.state == "InProgress" for status in vol.backupStatus):
            active_ops.append(f"backup-{vol_name}")
    return active_ops
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7a9098b and 20a04e4.

📒 Files selected for processing (23)
  • e2e/keywords/persistentvolumeclaim.resource (2 hunks)
  • e2e/keywords/statefulset.resource (0 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/backup/backup.py (1 hunks)
  • e2e/libs/backup/base.py (1 hunks)
  • e2e/libs/backup/rest.py (1 hunks)
  • e2e/libs/backupstore/base.py (1 hunks)
  • e2e/libs/backupstore/nfs.py (1 hunks)
  • e2e/libs/backupstore/s3.py (1 hunks)
  • e2e/libs/engine_image/engine_image.py (1 hunks)
  • e2e/libs/host/aws.py (1 hunks)
  • e2e/libs/host/harvester.py (1 hunks)
  • e2e/libs/k8s/k8s.py (1 hunks)
  • e2e/libs/keywords/common_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (3 hunks)
  • e2e/libs/keywords/workload_keywords.py (2 hunks)
  • e2e/libs/node/node.py (2 hunks)
  • e2e/libs/persistentvolumeclaim/crd.py (1 hunks)
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (4 hunks)
  • e2e/libs/utility/utility.py (5 hunks)
  • e2e/libs/volume/crd.py (0 hunks)
  • e2e/tests/regression/test_persistentvolumeclaim.robot (1 hunks)
💤 Files with no reviewable changes (2)
  • e2e/keywords/statefulset.resource
  • e2e/libs/volume/crd.py
🚧 Files skipped from review as they are similar to previous changes (15)
  • e2e/libs/backupstore/base.py
  • e2e/libs/keywords/common_keywords.py
  • e2e/libs/k8s/k8s.py
  • e2e/libs/node/node.py
  • e2e/libs/engine_image/engine_image.py
  • e2e/libs/persistentvolumeclaim/crd.py
  • e2e/keywords/persistentvolumeclaim.resource
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py
  • e2e/libs/backupstore/nfs.py
  • e2e/libs/backup/backup.py
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py
  • e2e/libs/host/harvester.py
  • e2e/libs/backupstore/s3.py
  • e2e/libs/host/aws.py
  • e2e/libs/backup/base.py
🧰 Additional context used
🪛 Ruff (0.8.0)
e2e/libs/keywords/volume_keywords.py

2-2: json imported but unused

Remove unused import: json

(F401)

🔇 Additional comments (9)
e2e/tests/regression/test_persistentvolumeclaim.robot (2)

1-14: LGTM! Well-structured test suite setup.

The test suite is properly configured with:

  • Clear documentation
  • Appropriate test tags
  • Required resources
  • Test setup/teardown hooks

22-28: LGTM! Well-documented test case.

The test case includes:

  • Clear tags for categorization
  • Comprehensive documentation
  • Reference to the related GitHub issue
e2e/libs/keywords/workload_keywords.py (1)

161-167: LGTM! Improved size handling in expansion method.

The method now:

  • Properly retrieves current size
  • Converts sizes to bytes for comparison
  • Supports retry control through skip_retry parameter
e2e/libs/utility/utility.py (1)

343-355: LGTM! Well-implemented size conversion function.

The function:

  • Handles common size formats (GiB, MiB)
  • Properly validates input
  • Uses appropriate error handling
e2e/keywords/workload.resource (1)

211-214: LGTM: Formatting improvements

The spacing adjustments improve code consistency.

e2e/libs/keywords/volume_keywords.py (2)

350-352: LGTM!

The implementation correctly retrieves the volume size from the volume spec.


354-367: 🛠️ Refactor suggestion

Several improvements needed for robustness

The method implementation needs the following improvements:

  1. Add validation for replica_list before accessing the first element
  2. Enhance error message with more context
  3. Consider simplifying the nested loops using list comprehension or helper methods

Consider applying these improvements:

 def get_volume_node_disk_storage_maximum(self, volume_name, node_name):
     replica_list = self.replica.get(volume_name, node_name)
+    if not replica_list:
+        raise ValueError(f"No replicas found for volume {volume_name} on node {node_name}")
     replica = replica_list[0]
     replica_name = replica['metadata']['name']
     node = self.node.get_node_by_name(node_name, namespace=LONGHORN_NAMESPACE)
-    for diskName in node.disks:
-        disk = node.disks[diskName]
-
-        for scheduledReplica in disk['scheduledReplica']:
-            if scheduledReplica == replica_name:
-                logging(f"Found replica {scheduledReplica} on node {node_name} scheduled to disk {diskName}")
-                return disk['storageMaximum']
+    # Find the disk containing our replica
+    matching_disks = [
+        (name, disk) for name, disk in node.disks.items()
+        if replica_name in disk['scheduledReplica']
+    ]
+    if matching_disks:
+        disk_name, disk = matching_disks[0]
+        logging(f"Found replica {replica_name} on node {node_name} scheduled to disk {disk_name}")
+        return disk['storageMaximum']

-    raise Exception(f"Failed to find storageMaximum for volume {volume_name} replica {replica_name} on node {node_name}")
+    raise ValueError(
+        f"Failed to find storageMaximum for volume {volume_name} replica {replica_name} on node {node_name}. "
+        f"Ensure the replica is properly scheduled to a disk on the specified node."
+    )

The improvements:

  1. Add validation for empty replica list
  2. Replace nested loops with a more readable list comprehension
  3. Use ValueError with a more descriptive error message
  4. Improve variable naming (camelCase to snake_case for consistency)

Likely invalid or redundant comment.

e2e/libs/backup/rest.py (2)

1-14: LGTM! Clean import organization and class setup.

The imports are well-organized, and the renaming of Snapshot to RestSnapshot helps avoid naming conflicts.


Line range hint 1-219: Verify alignment with PR objectives regarding PVC expansion testing.

While the changes improve backup handling, I don't see direct implementation of tests for PVC expansion beyond maximum size (as mentioned in PR objectives - longhorn/longhorn#6633). Consider adding specific test cases for:

  1. Attempting to expand PVC beyond available capacity
  2. Verifying proper error handling when expansion fails
  3. Ensuring volumes don't get stuck in resizing state

Signed-off-by: Chin-Ya Huang <[email protected]>
longhorn/longhorn-6633

Signed-off-by: Chin-Ya Huang <[email protected]>
@c3y1huang c3y1huang force-pushed the 6633-prevent-resize-stuck branch from 20a04e4 to 38c8f66 Compare December 9, 2024 07:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (5)
e2e/libs/keywords/workload_keywords.py (1)

161-167: Consider adding size validation before expansion

The implementation correctly handles size conversion and expansion. However, to address the core issue from longhorn/longhorn#6633 about preventing volumes from getting stuck, consider adding validation to check if the requested size exceeds the maximum available capacity before attempting expansion.

 def expand_workload_claim_size(self, workload_name, size_in_byte, claim_index=0, skip_retry=False):
     claim_name = get_workload_persistent_volume_claim_name(workload_name, index=claim_index)
     current_size = self.persistentvolumeclaim.get(claim_name).spec.resources.requests['storage']
     current_size_byte = convert_size_to_bytes(current_size)
+    # Get maximum available capacity
+    max_size = self.persistentvolumeclaim.get_maximum_available_size()
+    if size_in_byte > max_size:
+        raise ValueError(f"Requested size {size_in_byte} exceeds maximum available capacity {max_size}")

     logging(f'Expanding {workload_name} persistentvolumeclaim {claim_name} from {current_size_byte} to {size_in_byte}')
     self.persistentvolumeclaim.expand(claim_name, size_in_byte, skip_retry=skip_retry)
e2e/libs/utility/utility.py (3)

Line range hint 179-187: Add retry logic for API call resilience.

The function should implement retry logic similar to other API calls in the file to handle transient errors.

 def filter_cr(group, version, namespace, plural, field_selector="", label_selector=""):
     api = client.CustomObjectsApi()
+    retry_count, retry_interval = get_retry_count_and_interval()
+    for _ in range(retry_count):
+        try:
+            resp = api.list_namespaced_custom_object(group, version, namespace, plural, field_selector=field_selector, label_selector=label_selector)
+            return resp
+        except ApiException as e:
+            logging(f"Listing namespaced custom object: {e}")
+        time.sleep(retry_interval)
-    try:
-        resp = api.list_namespaced_custom_object(group, version, namespace, plural, field_selector=field_selector, label_selector=label_selector)
-        return resp
-    except ApiException as e:
-        logging(f"Listing namespaced custom object: {e}")

Line range hint 188-202: Replace assert with proper exception handling.

The assert statement can be disabled with Python -O flag. Use an explicit exception instead.

     for i in range(retry_count):
         try:
             resp = api.list_namespaced_pod(
                 namespace=namespace,
                 label_selector=label_selector)
             return resp.items
         except Exception as e:
             logging(f"Failed to list namespaced {namespace} pods with error: {e}")
         time.sleep(retry_interval)
-    assert False, f"Failed to list namespaced {namespace} pods"
+    raise RuntimeError(f"Failed to list namespaced {namespace} pods after {retry_count} retries")

343-355: Consider supporting additional size formats.

The function handles common size formats well, but could be enhanced to support more formats like KiB and TiB for better flexibility.

 def convert_size_to_bytes(size):
     size = size.replace(" ", "")
 
+    if size.endswith("TiB"):
+        return int(size[:-3]) * 1024 * 1024 * 1024 * 1024
+
     if size.endswith("GiB"):
         return int(size[:-3]) * 1024 * 1024 * 1024
 
     if size.endswith("MiB"):
         return int(size[:-3]) * 1024 * 1024
 
+    if size.endswith("KiB"):
+        return int(size[:-3]) * 1024
+
     if size.isdigit():
         return int(size)
 
     raise ValueError(f"Invalid size format: {size}")
e2e/libs/backup/rest.py (1)

Line range hint 186-204: Consider improving retry logic for backup verification

While the retry mechanism is good, consider adding a maximum retry count and logging the current attempt number for better debugging.

Here's a suggested improvement:

     def assert_all_backups_before_uninstall_exist(self, backups_before_uninstall):
+        success = False
         for i in range(self.retry_count):
+            logging(f"Attempt {i + 1}/{self.retry_count} to verify backups")
             time.sleep(self.retry_interval)
             try:
                 current_backups = self.list_all()
                 assert len(current_backups["items"]) == len(backups_before_uninstall["items"])
+                success = True
                 break
             except Exception as e:
                 logging(f"{e}")
                 continue
 
+        if not success:
+            raise AssertionError("Failed to verify backups after maximum retries")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 20a04e4 and 38c8f66.

📒 Files selected for processing (23)
  • e2e/keywords/persistentvolumeclaim.resource (2 hunks)
  • e2e/keywords/statefulset.resource (0 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/backup/backup.py (1 hunks)
  • e2e/libs/backup/base.py (1 hunks)
  • e2e/libs/backup/rest.py (1 hunks)
  • e2e/libs/backupstore/base.py (1 hunks)
  • e2e/libs/backupstore/nfs.py (1 hunks)
  • e2e/libs/backupstore/s3.py (1 hunks)
  • e2e/libs/engine_image/engine_image.py (1 hunks)
  • e2e/libs/host/aws.py (1 hunks)
  • e2e/libs/host/harvester.py (1 hunks)
  • e2e/libs/k8s/k8s.py (1 hunks)
  • e2e/libs/keywords/common_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (3 hunks)
  • e2e/libs/keywords/workload_keywords.py (2 hunks)
  • e2e/libs/node/node.py (2 hunks)
  • e2e/libs/persistentvolumeclaim/crd.py (1 hunks)
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (4 hunks)
  • e2e/libs/utility/utility.py (5 hunks)
  • e2e/libs/volume/crd.py (0 hunks)
  • e2e/tests/regression/test_persistentvolumeclaim.robot (1 hunks)
💤 Files with no reviewable changes (2)
  • e2e/keywords/statefulset.resource
  • e2e/libs/volume/crd.py
🚧 Files skipped from review as they are similar to previous changes (17)
  • e2e/libs/backupstore/base.py
  • e2e/libs/keywords/common_keywords.py
  • e2e/libs/node/node.py
  • e2e/tests/regression/test_persistentvolumeclaim.robot
  • e2e/libs/backupstore/nfs.py
  • e2e/libs/engine_image/engine_image.py
  • e2e/libs/host/harvester.py
  • e2e/libs/backup/backup.py
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py
  • e2e/libs/k8s/k8s.py
  • e2e/keywords/persistentvolumeclaim.resource
  • e2e/libs/host/aws.py
  • e2e/keywords/workload.resource
  • e2e/libs/backupstore/s3.py
  • e2e/libs/persistentvolumeclaim/crd.py
  • e2e/libs/backup/base.py
🧰 Additional context used
🪛 Ruff (0.8.0)
e2e/libs/keywords/volume_keywords.py

2-2: json imported but unused

Remove unused import: json

(F401)

🔇 Additional comments (9)
e2e/libs/keywords/workload_keywords.py (1)

31-31: Remove duplicate import statement

The convert_size_to_bytes function is imported twice.

e2e/libs/utility/utility.py (1)

Line range hint 167-178: LGTM!

The implementation includes proper error handling and retry logic for retrieving custom resources.

e2e/libs/keywords/volume_keywords.py (2)

350-352: LGTM!

The function correctly retrieves the volume size from the volume specification.


354-367: 🛠️ Refactor suggestion

Add input validation for replica list.

The function should validate that replica_list is not empty before accessing its first element to prevent IndexError.

 def get_volume_node_disk_storage_maximum(self, volume_name, node_name):
     replica_list = self.replica.get(volume_name, node_name)
+    if not replica_list:
+        raise ValueError(f"No replicas found for volume {volume_name} on node {node_name}")
     replica = replica_list[0]
     replica_name = replica['metadata']['name']
     node = self.node.get_node_by_name(node_name, namespace=LONGHORN_NAMESPACE)
     for diskName in node.disks:
         disk = node.disks[diskName]

         for scheduledReplica in disk['scheduledReplica']:
             if scheduledReplica == replica_name:
                 logging(f"Found replica {scheduledReplica} on node {node_name} scheduled to disk {diskName}")
                 return disk['storageMaximum']

-    raise Exception(f"Failed to find storageMaximum for volume {volume_name} replica {replica_name} on node {node_name}")
+    raise ValueError(
+        f"Failed to find storageMaximum for volume {volume_name} replica {replica_name} on node {node_name}. "
+        f"Ensure the replica is properly scheduled to a disk on the specified node."
+    )

Likely invalid or redundant comment.

e2e/libs/backup/rest.py (5)

1-11: LGTM: Import changes are well-organized

The imports are properly structured with clear aliases to avoid naming conflicts.


Line range hint 39-40: LGTM: Good addition of backup timestamp validation

The new assertion ensures that the backup timestamp is properly set after completion, which is a valuable validation step.


Line range hint 82-86: LGTM: Improved error handling with detailed logging

The enhanced error handling with specific logging for different error types will make debugging backup metadata issues much easier.


Line range hint 98-99: LGTM: Enhanced logging for backup operations

The added logging statements improve visibility into the backup retrieval process and help track potential issues.

Also applies to: 106-107


Line range hint 1-275: Verify coverage of PVC expansion testing

While the error handling and logging improvements are good, we should verify if this file adequately supports testing PVC expansion beyond maximum size as mentioned in the PR objectives (longhorn/longhorn#6633).

e2e/libs/keywords/workload_keywords.py Show resolved Hide resolved
e2e/libs/keywords/volume_keywords.py Outdated Show resolved Hide resolved
@c3y1huang c3y1huang force-pushed the 6633-prevent-resize-stuck branch from 38c8f66 to 43a1a09 Compare December 9, 2024 09:30
@c3y1huang c3y1huang requested a review from roger-ryao December 9, 2024 09:32
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (4)
e2e/tests/negative/stress_filesystem.robot (1)

Line range hint 1-72: Add test documentation for storage requirements

The test file should document the minimum storage requirements needed to run these tests successfully, especially given the repeated volume expansions.

Add to the Documentation section:

Documentation     Negative Test Cases
...               Prerequisites:
...               - Minimum available storage: (LOOP_COUNT * 100 MiB) + 5Gi per test
...               - Maximum storage size configuration should be defined
e2e/tests/regression/test_persistentvolumeclaim.robot (1)

2-2: Enhance test suite documentation.

Consider adding more details to the documentation, such as test prerequisites, expected behaviors, and any specific setup requirements.

-Documentation    PersistentVolumeClaim Test Cases
+Documentation    Test suite for validating PersistentVolumeClaim (PVC) behaviors.
+...             Prerequisites:
+...             - Kubernetes cluster with Longhorn installed
+...             - Storage over-provisioning enabled
+...             Expected behaviors:
+...             - PVC expansion should fail when requested size exceeds maximum storage
+...             - PVC expansion should succeed within storage limits
e2e/keywords/workload.resource (2)

236-256: Add documentation for volume expansion keywords.

The implementation is correct, but documentation would help users understand the purpose and usage of these keywords.

+[Documentation]    Expands the volume of a workload to a specific size.
+...               Arguments:
+...               - workload_kind: Type of the workload (e.g., deployment)
+...               - workload_id: Identifier of the workload
+...               - size: Target size (e.g., "2GiB")
 Expand ${workload_kind} ${workload_id} volume to ${size}
     ${workload_name} =   generate_name_with_suffix    ${workload_kind}    ${workload_id}
     ${new_size} =    convert_size_to_bytes    ${size}
     expand_workload_claim_size    ${workload_name}    ${new_size}

+[Documentation]    Expands the volume of a workload by adding additional size.
+...               Arguments:
+...               - workload_kind: Type of the workload (e.g., deployment)
+...               - workload_id: Identifier of the workload
+...               - size: Additional size to add (e.g., "1GiB")
 Expand ${workload_kind} ${workload_id} volume with additional ${size}
     ${workload_name} =   generate_name_with_suffix    ${workload_kind}    ${workload_id}
     ${new_size} =    convert_size_to_bytes    ${size}
     expand_workload_claim_size_with_additional_bytes    ${workload_name}    ${new_size}

+[Documentation]    Verifies that volume expansion fails when exceeding maximum size.
+...               Arguments:
+...               - workload_kind: Type of the workload (e.g., deployment)
+...               - workload_id: Identifier of the workload
 Expand ${workload_kind} ${workload_id} volume more than storage maximum size should fail
     ${workload_name} =   generate_name_with_suffix    ${workload_kind}    ${workload_id}
     ${volume_name} =    get_workload_volume_name    ${workload_name}
     ${node_name} =    get_volume_node    ${volume_name}
     ${max_size} =    get_volume_node_disk_storage_maximum    ${volume_name}    ${node_name}
     ${new_size} =    evaluate    ${max_size} + 1

     Run Keyword And Expect Error    Failed to expand*    expand_workload_claim_size    ${workload_name}    ${new_size}    skip_retry=True

257-265: Add documentation for volume size assertion keyword.

The implementation is correct, but documentation would help users understand the purpose and usage.

+[Documentation]    Asserts that the volume size remains unchanged for a specified duration.
+...               Arguments:
+...               - workload_kind: Type of the workload (e.g., deployment)
+...               - workload_id: Identifier of the workload
+...               - size: Expected size (e.g., "2GiB")
+...               - period: Duration in seconds to maintain the size
 Assert volume size of ${workload_kind} ${workload_id} remains ${size} for at least ${period} seconds
     ${workload_name} =   generate_name_with_suffix    ${workload_kind}    ${workload_id}
     ${volume_name} =    get_workload_volume_name    ${workload_name}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 38c8f66 and 43a1a09.

📒 Files selected for processing (13)
  • e2e/keywords/persistentvolumeclaim.resource (2 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/keywords/common_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (2 hunks)
  • e2e/libs/keywords/workload_keywords.py (2 hunks)
  • e2e/libs/node/node.py (2 hunks)
  • e2e/libs/persistentvolumeclaim/crd.py (1 hunks)
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (4 hunks)
  • e2e/tests/negative/stress_cpu.robot (2 hunks)
  • e2e/tests/negative/stress_filesystem.robot (2 hunks)
  • e2e/tests/negative/stress_memory.robot (2 hunks)
  • e2e/tests/regression/test_persistentvolumeclaim.robot (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • e2e/tests/negative/stress_memory.robot
  • e2e/tests/negative/stress_cpu.robot
🚧 Files skipped from review as they are similar to previous changes (7)
  • e2e/libs/keywords/common_keywords.py
  • e2e/libs/node/node.py
  • e2e/libs/keywords/volume_keywords.py
  • e2e/libs/persistentvolumeclaim/crd.py
  • e2e/keywords/persistentvolumeclaim.resource
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py
🔇 Additional comments (3)
e2e/tests/negative/stress_filesystem.robot (1)

Line range hint 39-54: Consider adding storage capacity validation in expansion tests

Both online and offline expansion tests perform multiple expansions in a loop without checking available storage. This could inadvertently lead to the stuck volume scenario we're trying to prevent.

Add storage capacity checks before expansion:

    FOR    ${i}    IN RANGE    ${LOOP_COUNT}
        ${available_capacity}=    Get Available Storage Capacity
        ${expansion_size}=    Set Variable    100
        Skip If    ${available_capacity} < ${expansion_size}    Insufficient storage space
        When Expand statefulset 0 volume with additional ${expansion_size} MiB
        ...
    END

Also applies to: 63-72

e2e/tests/regression/test_persistentvolumeclaim.robot (1)

22-44: LGTM! Well-structured test case with comprehensive validation.

The test case effectively:

  1. Sets up the environment with storage over-provisioning
  2. Creates and validates PVC with initial size
  3. Verifies expansion failure beyond maximum size
  4. Confirms successful expansion within limits
  5. Ensures data integrity throughout the process
e2e/keywords/workload.resource (1)

Line range hint 211-234: LGTM! Clean formatting changes.

The indentation and conditional logic are well-structured.

e2e/tests/negative/stress_filesystem.robot Show resolved Hide resolved
e2e/libs/keywords/workload_keywords.py Show resolved Hide resolved
e2e/libs/keywords/workload_keywords.py Show resolved Hide resolved
longhorn/longhorn-6633

Signed-off-by: Chin-Ya Huang <[email protected]>
longhorn/longhorn-6633

Signed-off-by: Chin-Ya Huang <[email protected]>
@c3y1huang c3y1huang force-pushed the 6633-prevent-resize-stuck branch from 43a1a09 to 331b852 Compare December 9, 2024 10:33
@c3y1huang c3y1huang requested a review from yangchiu December 9, 2024 10:35
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
e2e/keywords/persistentvolumeclaim.resource (1)

27-34: Consider adding timeout handling for assertion

The new assertion keyword is crucial for verifying PVC size behavior, but it could benefit from additional error handling.

Consider adding a timeout message and handling test failures more gracefully:

 Assert persistentvolumeclaim ${claim_id} requested size remains ${size} for at least ${period} seconds
     ${claim_name} =    generate_name_with_suffix    claim    ${claim_id}
+    ${timeout_msg} =    Set Variable    PVC ${claim_name} size changed unexpectedly within ${period} seconds
     FOR    ${i}    IN RANGE    ${period}
         ${expected_size_byte} =    convert_size_to_bytes    ${size}    to_str=True
         ${current_size_byte} =    get_claim_requested_size    ${claim_name}
-        Should Be Equal    ${current_size_byte}    ${expected_size_byte}
+        Should Be Equal    ${current_size_byte}    ${expected_size_byte}    ${timeout_msg}
         Sleep    1
     END
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 43a1a09 and 331b852.

📒 Files selected for processing (13)
  • e2e/keywords/persistentvolumeclaim.resource (2 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/keywords/common_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (2 hunks)
  • e2e/libs/keywords/workload_keywords.py (2 hunks)
  • e2e/libs/node/node.py (2 hunks)
  • e2e/libs/persistentvolumeclaim/crd.py (1 hunks)
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (4 hunks)
  • e2e/tests/negative/stress_cpu.robot (2 hunks)
  • e2e/tests/negative/stress_filesystem.robot (2 hunks)
  • e2e/tests/negative/stress_memory.robot (2 hunks)
  • e2e/tests/regression/test_persistentvolumeclaim.robot (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • e2e/tests/negative/stress_memory.robot
  • e2e/tests/negative/stress_cpu.robot
  • e2e/tests/negative/stress_filesystem.robot
  • e2e/libs/keywords/volume_keywords.py
  • e2e/libs/keywords/common_keywords.py
  • e2e/libs/node/node.py
  • e2e/tests/regression/test_persistentvolumeclaim.robot
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py
  • e2e/libs/persistentvolumeclaim/crd.py
  • e2e/libs/keywords/workload_keywords.py
🔇 Additional comments (8)
e2e/keywords/workload.resource (5)

211-214: LGTM! Improved code formatting

The indentation changes improve code readability while maintaining the same functionality.


236-241: LGTM! Well-structured volume expansion keyword

The implementation correctly handles size conversion and uses appropriate helper functions.


242-247: LGTM! Clear implementation for incremental volume expansion

The keyword provides a good complement to the fixed-size expansion, allowing for relative size increases.


257-265: LGTM! Robust implementation for size verification

The keyword effectively verifies volume size stability over time:

  • Appropriate polling interval (1 second)
  • Proper size conversion for comparison
  • Clear error messages through Robot Framework's built-in assertions

248-256: LGTM! Robust implementation for testing expansion limits

The implementation aligns perfectly with the PR objective to test PVC expansion beyond maximum size. It:

  1. Correctly retrieves the maximum storage size
  2. Ensures the test exceeds this limit by exactly 1 byte
  3. Properly handles the expected failure case

Let's verify the helper functions being used:

✅ Verification successful

Let me verify the implementation of get_volume_node_disk_storage_maximum further since it's crucial for determining the maximum size limit.


Let me try a different approach to get the complete implementation.


Implementation verified - Test case correctly handles volume expansion limits

The test case and its helper functions are properly implemented:

  • get_volume_node_disk_storage_maximum correctly retrieves the maximum storage size by finding the replica's disk and its storageMaximum value
  • expand_workload_claim_size handles the expansion operation with proper size conversion and error handling
  • The test case properly validates the failure scenario by attempting to expand beyond the maximum size by 1 byte
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of helper functions used by the expansion keywords

# Check for the helper function implementations
rg -A 5 "def (convert_size_to_bytes|get_volume_node_disk_storage_maximum|expand_workload_claim_size)"

Length of output: 2809


Script:

#!/bin/bash
# Check the complete implementation of get_volume_node_disk_storage_maximum
ast-grep --pattern 'def get_volume_node_disk_storage_maximum($$$)'

Length of output: 68


Script:

#!/bin/bash
# Check the complete implementation using ripgrep with context
rg -A 15 "def get_volume_node_disk_storage_maximum" e2e/libs/keywords/volume_keywords.py

Length of output: 881

e2e/keywords/persistentvolumeclaim.resource (1)

11-13: LGTM: Enhanced PVC creation flexibility

The addition of the &{config} parameter allows for dynamic configuration of PVC properties, which is essential for testing storage size limitations.

e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (2)

27-29: LGTM: Flexible storage size configuration

The addition of the storage_size parameter with proper byte conversion enhances the PVC creation flexibility needed for testing size limitations.

Also applies to: 44-46


104-107: ⚠️ Potential issue

Add size validation to prevent volume resize stuck issues

Both expand methods should validate the target size against system capacity before attempting expansion, as suggested in the linked issue longhorn/longhorn#6633.

Add validation to both methods:

 def expand(self, claim_name, size_in_byte, skip_retry=False):
+    # Validate target size against system capacity
+    if size_in_byte > self._get_max_storage_size():
+        raise ValueError(f"Target size {size_in_byte} exceeds maximum storage capacity")
     expanded_size = self.claim.expand(claim_name, size_in_byte, skip_retry=skip_retry)
     self.set_annotation(claim_name, ANNOT_EXPANDED_SIZE, str(expanded_size))

 def expand_with_additional_bytes(self, claim_name, size_in_byte, skip_retry=False):
     pvc = self.claim.get(claim_name)
     current_size = int(pvc.spec.resources.requests['storage'])
     target_size = current_size + size_in_byte
+    # Validate target size against system capacity
+    if target_size > self._get_max_storage_size():
+        raise ValueError(f"Target size {target_size} exceeds maximum storage capacity")
     expanded_size = self.claim.expand(claim_name, target_size, skip_retry=skip_retry)
     self.set_annotation(claim_name, ANNOT_EXPANDED_SIZE, str(expanded_size))

Additionally, implement the helper method:

def _get_max_storage_size(self):
    """Get maximum available storage size in bytes"""
    # Implementation needed to fetch system capacity
    raise NotImplementedError("Method needs to be implemented")

Also applies to: 108-114

roger-ryao
roger-ryao previously approved these changes Dec 9, 2024
Copy link
Contributor

@roger-ryao roger-ryao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
e2e/keywords/workload.resource (1)

257-265: Add documentation for the temporal verification keyword

The implementation effectively verifies volume size stability over time. Consider adding a comment explaining why temporal verification is necessary (e.g., to ensure size remains stable after failed expansion attempts).

 Assert volume size of ${workload_kind} ${workload_id} remains ${size} for at least ${period} seconds
+    [Documentation]    Verifies that the volume size remains stable and unchanged
+    ...    over the specified period. This is particularly useful for ensuring
+    ...    that failed expansion attempts don't cause unexpected size changes.
     ${workload_name} =   generate_name_with_suffix    ${workload_kind}    ${workload_id}
e2e/keywords/persistentvolumeclaim.resource (1)

27-34: Consider optimizing the polling interval

The keyword effectively verifies PVC size stability, but polling every second for the entire period might be resource-intensive for longer durations.

Consider using an exponential backoff or longer intervals:

-        Sleep    1
+        ${sleep_time}=    Evaluate    min(5, 1 * (${i} + 1))    # Gradually increase up to 5 seconds
+        Sleep    ${sleep_time}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 331b852 and 23fe339.

📒 Files selected for processing (13)
  • e2e/keywords/persistentvolumeclaim.resource (2 hunks)
  • e2e/keywords/workload.resource (2 hunks)
  • e2e/libs/keywords/common_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py (2 hunks)
  • e2e/libs/keywords/volume_keywords.py (2 hunks)
  • e2e/libs/keywords/workload_keywords.py (2 hunks)
  • e2e/libs/node/node.py (2 hunks)
  • e2e/libs/persistentvolumeclaim/crd.py (1 hunks)
  • e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (4 hunks)
  • e2e/tests/negative/stress_cpu.robot (2 hunks)
  • e2e/tests/negative/stress_filesystem.robot (2 hunks)
  • e2e/tests/negative/stress_memory.robot (2 hunks)
  • e2e/tests/regression/test_persistentvolumeclaim.robot (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • e2e/tests/negative/stress_cpu.robot
  • e2e/libs/keywords/common_keywords.py
  • e2e/tests/negative/stress_memory.robot
  • e2e/tests/regression/test_persistentvolumeclaim.robot
  • e2e/libs/node/node.py
  • e2e/tests/negative/stress_filesystem.robot
  • e2e/libs/keywords/volume_keywords.py
  • e2e/libs/persistentvolumeclaim/crd.py
  • e2e/libs/keywords/persistentvolumeclaim_keywords.py
  • e2e/libs/keywords/workload_keywords.py
🔇 Additional comments (8)
e2e/keywords/workload.resource (4)

211-214: LGTM: Formatting improvements

The whitespace adjustments improve code readability while maintaining the existing functionality.


236-241: LGTM: Clean implementation of volume expansion keyword

The keyword properly handles size conversion and follows the established pattern for workload management.


242-247: LGTM: Well-structured incremental expansion keyword

The keyword provides clear differentiation between absolute and incremental volume expansion scenarios.


248-256: LGTM: Effectively implements the core test case

This keyword directly addresses the PR objective by testing volume expansion beyond maximum size. The implementation:

  • Correctly retrieves the maximum storage size from the node
  • Attempts expansion with size + 1 to ensure exceeding the limit
  • Properly expects and verifies the failure condition
  • Uses skip_retry=True to prevent unnecessary retries

Let's verify the error message pattern used in the test:

✅ Verification successful

Error message pattern matches the implementation

The error message pattern "Failed to expand*" used in the test matches the actual error message format in e2e/libs/persistentvolumeclaim/crd.py: "Failed to expand {claim_name} size to {size}". The test's error handling is correctly implemented and consistent with the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the error message pattern consistency across the codebase

# Search for similar error handling patterns
rg -l 'Failed to expand' --type py

Length of output: 74


Script:

#!/bin/bash
# Let's check the actual error message pattern in the crd.py file
rg -A 5 'Failed to expand' e2e/libs/persistentvolumeclaim/crd.py

Length of output: 136

e2e/keywords/persistentvolumeclaim.resource (1)

11-13: LGTM! Verify the config parameter handling

The addition of the &{config} dictionary argument enhances the keyword's flexibility, allowing for dynamic configuration of PVC parameters including storage size.

✅ Verification successful

LGTM! The config parameter is correctly implemented and used

The verification shows that:

  • The Create persistentvolumeclaim keyword is used extensively across test files with different configurations
  • The implementation in e2e/libs/keywords/persistentvolumeclaim_keywords.py properly handles the config parameters including storage_size
  • Test files like test_persistentvolumeclaim.robot demonstrate correct usage with storage size parameter: Create persistentvolumeclaim 0 using RWX volume storage_size=2GiB
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify the config parameter usage across test files
rg -l "Create persistentvolumeclaim.*&{" test/

Length of output: 176


Script:

#!/bin/bash
# Search for Create persistentvolumeclaim keyword usage with proper regex escaping
rg -l "Create persistentvolumeclaim.*\&\{" test/

# Also search in e2e directory as the keyword is defined there
rg -l "Create persistentvolumeclaim.*\&\{" e2e/

# Let's see the actual implementation of create_persistentvolumeclaim
ast-grep --pattern 'create_persistentvolumeclaim    ${$_}    ${$_}    &{$_}'

Length of output: 225


Script:

#!/bin/bash
# Let's first find all Robot Framework files in the repository
fd -e robot -e resource

# Then search for Create persistentvolumeclaim usage in those files
rg "Create persistentvolumeclaim" -A 2

# Also look for the implementation of create_persistentvolumeclaim
rg "create_persistentvolumeclaim" -A 3

Length of output: 20598

e2e/libs/persistentvolumeclaim/persistentvolumeclaim.py (3)

Line range hint 27-46: LGTM! Good implementation of storage size configuration

The addition of the storage_size parameter with a default value maintains backward compatibility while enabling dynamic storage size testing.


104-107: ⚠️ Potential issue

Add size validation to prevent volume resize stuck issues

While the skip_retry parameter is a good addition, the method should validate the target size before attempting expansion to prevent volumes from getting stuck.

As suggested in previous reviews, consider adding size validation:

 def expand(self, claim_name, size_in_byte, skip_retry=False):
+    # Validate target size against system capacity
+    if not self._is_size_within_limits(size_in_byte):
+        raise ValueError(f"Target size {size_in_byte} exceeds storage capacity")
     expanded_size = self.claim.expand(claim_name, size_in_byte, skip_retry=skip_retry)
     self.set_annotation(claim_name, ANNOT_EXPANDED_SIZE, str(expanded_size))

108-113: ⚠️ Potential issue

Add size validation and error handling to expansion method

The incremental expansion logic is good, but it should include size validation and proper error handling to prevent stuck volumes.

Consider adding validation and error handling:

 def expand_with_additional_bytes(self, claim_name, size_in_byte, skip_retry=False):
     pvc = self.claim.get(claim_name)
     current_size = int(pvc.spec.resources.requests['storage'])
     target_size = current_size + size_in_byte
+    # Validate target size against system capacity
+    if not self._is_size_within_limits(target_size):
+        raise ValueError(f"Target size {target_size} exceeds storage capacity")
+    try:
         expanded_size = self.claim.expand(claim_name, target_size, skip_retry=skip_retry)
         self.set_annotation(claim_name, ANNOT_EXPANDED_SIZE, str(expanded_size))
+        return expanded_size
+    except Exception as e:
+        logging(f"Failed to expand PVC {claim_name} by {size_in_byte} bytes: {str(e)}")
+        raise

Likely invalid or redundant comment.

@c3y1huang c3y1huang requested a review from yangchiu December 10, 2024 01:21
Copy link
Member

@yangchiu yangchiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yangchiu yangchiu merged commit caaa648 into longhorn:master Dec 10, 2024
5 of 6 checks passed
@c3y1huang c3y1huang deleted the 6633-prevent-resize-stuck branch December 10, 2024 02:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants