Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(robot): volume should not reattach after node eviction #2164

Conversation

c3y1huang
Copy link
Collaborator

@c3y1huang c3y1huang commented Nov 14, 2024

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9781, longhorn/longhorn#9811

What this PR does / why we need it:

Adding a test case to reproduce longhorn/longhorn#9781.

Special notes for your reviewer:

None

Additional documentation or context

None

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new keywords for managing nodes and volumes, including Set node ${node_id} with and Assert volume ${volume_id} remains detached for at least ${period} seconds.
    • Added methods for scaling stateful sets and managing storage classes.
    • New test cases added to enhance volume management scenarios, particularly related to node eviction and ensuring volumes remain detached.
  • Bug Fixes

    • Improved error handling and control flow in volume operations and test cases.
  • Documentation

    • Updated test cases to ensure proper state transitions and validations for volume operations.

@c3y1huang c3y1huang self-assigned this Nov 14, 2024
Copy link

coderabbitai bot commented Nov 14, 2024

Walkthrough

The pull request introduces enhancements to the e2e testing framework, adding new keywords for managing nodes and volumes, improving logging functionality, and expanding test cases. Key additions include a method for setting node configurations, a verification step for detached volumes, and new classes for managing storage classes and replicas. The changes streamline existing functionality, particularly in volume management, while maintaining the integrity of existing methods and structures.

Changes

File Change Summary
e2e/keywords/node.resource Added keyword Set node ${node_id} with to configure nodes.
e2e/keywords/volume.resource Introduced keyword Assert volume ${volume_id} remains detached for at least ${period} seconds.
e2e/libs/keywords/backing_image_keywords.py Added method wait_all_backing_image_managers_running.
e2e/libs/keywords/backup_keywords.py Added logging to check_restored_volume_checksum.
e2e/libs/keywords/backupstore_keywords.py Adjusted import statements; no functional changes.
e2e/libs/keywords/common_keywords.py Added methods init_k8s_api_client and generate_name_with_suffix.
e2e/libs/keywords/engine_image_keywords.py Removed unused import for logging.
e2e/libs/keywords/engine_keywords.py Removed unused import for logging.
e2e/libs/keywords/host_keywords.py Minor formatting changes; no functional changes.
e2e/libs/keywords/k8s_keywords.py Reorganized import statements; no functional changes.
e2e/libs/keywords/longhorn_deploy_keywords.py Minor formatting changes; no functional changes.
e2e/libs/keywords/network_keywords.py Added method wait_for_block_network_pod_completed.
e2e/libs/keywords/node_keywords.py Added method set_node for node configuration.
e2e/libs/keywords/persistentvolume_keywords.py Added logging to delete_persistentvolume.
e2e/libs/keywords/replica_keywords.py Introduced new class replica_keywords with method validate_replica_setting.
e2e/libs/keywords/setting_keywords.py Introduced new class setting_keywords with methods for managing settings.
e2e/libs/keywords/sharemanager_keywords.py Minor formatting changes; no functional changes.
e2e/libs/keywords/statefulset_keywords.py Added method scale_statefulset_up and modified scale_statefulset_down.
e2e/libs/keywords/storageclass_keywords.py Introduced new class storageclass_keywords with methods for managing storage classes.
e2e/libs/keywords/workload_keywords.py Removed redundant import for Node; no functional changes.
e2e/libs/node/node.py Added method set_node for updating node properties and enhanced error handling in set_node_scheduling.
e2e/libs/volume/crd.py Modified detach method to simplify control flow and improve error handling.
e2e/tests/negative/network_disconnect.robot Added step to wait for volume 0 to be detached in a test case.
e2e/tests/negative/stress_cpu.robot Added step to wait for volume 0 to be detached in a test case.
e2e/tests/negative/stress_filesystem.robot Added step to wait for volume 0 to be detached in a test case.
e2e/tests/negative/stress_memory.robot Added step to wait for volume 0 to be detached in a test case.
e2e/tests/regression/test_backing_image.robot Added step to wait for volume 0 to be detached in a test case.
e2e/tests/regression/test_backup.robot Added step to wait for volume 3 to be detached in a test case.
e2e/tests/regression/test_basic.robot Added steps to wait for volume detachment in multiple test cases.
e2e/tests/regression/test_settings.robot Added step to wait for volume 1 to be detached in a test case.
e2e/tests/regression/test_v2.robot Added step to wait for volume 0 to be detached in a test case.
e2e/tests/regression/test_volume.robot Added new test case for detached volume management and included new resources.

Assessment against linked issues

Objective Addressed Explanation
Volume should return to detached state during node eviction (9781)

Possibly related PRs

Suggested reviewers

  • chriscchien
  • yangchiu

Poem

🐰 In the fields where volumes play,
A node's fate is set today.
With tests that wait and check with care,
Detached they stay, free from despair.
So let us hop and cheer with glee,
For every change brings harmony! 🌼

Warning

Rate limit exceeded

@c3y1huang has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 10 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between a132a4e and a6b63ea.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a132a4e and a6b63ea.

📒 Files selected for processing (5)
  • e2e/keywords/node.resource (1 hunks)
  • e2e/keywords/volume.resource (1 hunks)
  • e2e/libs/keywords/node_keywords.py (2 hunks)
  • e2e/libs/node/node.py (1 hunks)
  • e2e/tests/regression/test_volume.robot (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • e2e/keywords/node.resource
  • e2e/keywords/volume.resource
  • e2e/libs/keywords/node_keywords.py
  • e2e/libs/node/node.py
  • e2e/tests/regression/test_volume.robot

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@c3y1huang
Copy link
Collaborator Author

@coderabbitai review

Copy link

coderabbitai bot commented Nov 14, 2024

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@c3y1huang c3y1huang force-pushed the 9781-volume-stuck-attached-during-node-eviction branch from eeadfdb to de9835c Compare November 14, 2024 09:11
@c3y1huang c3y1huang marked this pull request as ready for review November 15, 2024 00:44
@c3y1huang c3y1huang requested a review from a team as a code owner November 15, 2024 00:45
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (25)
e2e/libs/keywords/replica_keywords.py (3)

Line range hint 3-7: Add class-level documentation

The class name follows Robot Framework's keyword naming convention correctly. However, adding a docstring would improve maintainability by documenting the class's purpose and usage in the E2E testing framework.

 class replica_keywords:
+    """Keywords for managing and validating replica settings in E2E tests.
+
+    This class provides Robot Framework keywords for interacting with Longhorn
+    volume replicas, particularly focusing on settings validation.
+    """

Line range hint 8-9: Enhance method documentation and type safety

The method would benefit from type hints and docstring to improve maintainability and IDE support.

-    def validate_replica_setting(self, volume_name, setting_name, value):
-        return self.replica.validate_replica_setting(volume_name, setting_name, value)
+    def validate_replica_setting(self, volume_name: str, setting_name: str, value: str) -> bool:
+        """Validate a specific setting for a volume's replica.
+
+        Args:
+            volume_name: Name of the volume to validate
+            setting_name: Name of the replica setting to check
+            value: Expected value of the setting
+
+        Returns:
+            bool: True if the setting matches the expected value
+        """
+        return self.replica.validate_replica_setting(volume_name, setting_name, value)

Error handling improvements needed in replica validation

The verification reveals minimal error handling in the replica validation chain:

  • The CRD implementation only uses a basic assertion for value comparison
  • No exception handling for missing settings, invalid values, or API failures
  • The Base class defines abstract methods but no error handling patterns

Key locations needing error handling:

  • e2e/libs/replica/crd.py: Add try-except blocks around replica retrieval and validation
  • e2e/libs/keywords/replica_keywords.py: Wrap replica validation calls in error handling
  • e2e/libs/replica/base.py: Define common error handling patterns in the base class
🔗 Analysis chain

Line range hint 1-9: Verify error handling in the Replica class

The code delegates validation to the Replica class without any error handling. We should verify that the underlying Replica class properly handles error cases.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check error handling in the Replica class implementation

# Look for error handling in the Replica class
rg -A 5 "class Replica" 
rg -A 10 "validate_replica_setting" 

# Look for potential error cases and exceptions
rg -l "raise" $(fd -t f "replica.py")

Length of output: 1493


Script:

#!/bin/bash
# Let's check for error handling patterns in the CRD implementation and related files

# Check for assertions and error handling in CRD implementation
rg -A 5 "assert|raise|try|except" e2e/libs/replica/crd.py

# Look for the Base class implementation since Replica inherits from it
rg -A 5 "class Base" 

# Check for any error handling in the volume-related operations
rg -A 5 "def get" e2e/libs/replica/crd.py

Length of output: 3999

e2e/libs/keywords/persistentvolume_keywords.py (1)

Line range hint 11-13: Consider enhancing error handling and documentation

While the logging addition is good for observability, consider these improvements:

  1. Add a docstring describing the method's purpose and parameters
  2. Add error handling for the delete operation
  3. Log the operation result

Here's a suggested implementation:

     def delete_persistentvolume(self, name):
+        """Delete a persistent volume.
+        
+        Args:
+            name: Name of the persistent volume to delete
+        
+        Returns:
+            bool: True if successful, False otherwise
+        """
         logging(f'Deleting persistentvolume {name}')
-        return self.pv.delete(name)
+        try:
+            result = self.pv.delete(name)
+            logging(f'Deleted persistentvolume {name} successfully')
+            return result
+        except Exception as e:
+            logging(f'Failed to delete persistentvolume {name}: {str(e)}')
+            raise
e2e/libs/keywords/setting_keywords.py (1)

1-1: Add class documentation and type hints

The class would benefit from:

  1. A docstring explaining its purpose and usage in Robot Framework tests
  2. Type hints for better IDE support and code maintainability
from setting import Setting
+from typing import Optional

class setting_keywords:
+    """
+    Robot Framework keyword implementations for managing Longhorn settings.
+    
+    This class provides keywords for updating and managing Longhorn settings
+    during e2e tests, including backup store configuration and settings reset.
+    """

Also applies to: 3-4

e2e/libs/keywords/storageclass_keywords.py (5)

4-6: Add class-level documentation.

Consider adding a docstring to document the purpose and usage of this class, especially since it's part of an E2E testing framework.

 class storageclass_keywords:
+    """Keywords for managing storage classes in E2E tests.
+
+    This class provides Robot Framework keywords for creating and cleaning up
+    storage classes during E2E testing, particularly for volume management scenarios.
+    """

Line range hint 7-8: Add method documentation and type hints.

Consider adding a docstring and type hints to improve code maintainability.

     def __init__(self):
+        """Initialize storageclass_keywords with a new StorageClass instance."""
         self.storageclass = StorageClass()

Line range hint 10-12: Enhance method documentation and parameter validation.

The method would benefit from:

  1. A comprehensive docstring describing parameters and their valid values
  2. Type hints for better code maintainability
  3. Input validation for critical parameters
  4. Use of actual boolean values instead of strings
-    def create_storageclass(self, name, numberOfReplicas="3", migratable="false", dataLocality="disabled", fromBackup="", nfsOptions="", dataEngine="v1"):
+    def create_storageclass(
+        self,
+        name: str,
+        numberOfReplicas: str = "3",
+        migratable: bool = False,
+        dataLocality: str = "disabled",
+        fromBackup: str = "",
+        nfsOptions: str = "",
+        dataEngine: str = "v1"
+    ) -> None:
+        """Create a storage class with specified configuration.
+
+        Args:
+            name: Name of the storage class
+            numberOfReplicas: Number of replicas (default: "3")
+            migratable: Whether volumes can be migrated (default: False)
+            dataLocality: Data locality setting (default: "disabled")
+            fromBackup: Backup source if restoring (default: "")
+            nfsOptions: NFS-specific options (default: "")
+            dataEngine: Data engine version (default: "v1")
+
+        Raises:
+            ValueError: If name is empty or numberOfReplicas is invalid
+        """
+        if not name:
+            raise ValueError("Storage class name cannot be empty")
+        if not numberOfReplicas.isdigit() or int(numberOfReplicas) < 1:
+            raise ValueError("Number of replicas must be a positive integer")
+
         logging(f'Creating storageclass with {locals()}')
-        self.storageclass.create(name, numberOfReplicas, migratable, dataLocality, fromBackup, nfsOptions, dataEngine)
+        self.storageclass.create(
+            name,
+            numberOfReplicas,
+            str(migratable).lower(),  # Convert boolean to "true"/"false"
+            dataLocality,
+            fromBackup,
+            nfsOptions,
+            dataEngine
+        )

Line range hint 14-15: Add error handling and documentation.

Consider adding error handling and documentation to make the cleanup process more robust.

-    def cleanup_storageclasses(self):
-        self.storageclass.cleanup()
+    def cleanup_storageclasses(self) -> None:
+        """Clean up all storage classes created during testing.
+
+        This method ensures proper cleanup of storage class resources
+        after test completion.
+        """
+        try:
+            self.storageclass.cleanup()
+        except Exception as e:
+            logging(f"Error during storage class cleanup: {e}")
+            raise

Line range hint 4-15: Consider adding verification methods for storage class states.

Given that this PR focuses on testing volume detachment scenarios, consider adding methods to verify storage class states during node eviction tests. This could include methods to:

  1. Verify storage class configuration
  2. Check volume binding mode
  3. Validate storage class deletion state

This would enhance the testing capabilities for the volume detachment scenarios described in longhorn/longhorn#9781.

e2e/libs/keywords/common_keywords.py (2)

Line range hint 12-17: Add docstrings to new methods

While the implementation is correct, adding docstrings would improve maintainability and help other developers understand the purpose and usage of these methods.

Consider adding docstrings like this:

 def init_k8s_api_client(self):
+    """Initialize the Kubernetes API client for test automation.
+    This method should be called before performing any Kubernetes API operations.
+    """
     init_k8s_api_client()

 def generate_name_with_suffix(self, kind, suffix):
+    """Generate a name with a suffix for test resources.
+    
+    Args:
+        kind (str): The kind of resource (e.g., 'node', 'volume')
+        suffix (str): The suffix to append to the generated name
+        
+    Returns:
+        str: The generated name with the suffix
+    """
     return generate_name_with_suffix(kind, suffix)

Line range hint 1-24: Well-integrated changes supporting test automation needs

The new methods integrate well with the existing node management functionality and support the PR's objective of testing volume behavior during node eviction. The class maintains good cohesion and single responsibility.

Consider creating a separate configuration class if more Kubernetes client setup methods are needed in the future, to maintain the single responsibility principle of this keywords class.

e2e/keywords/node.resource (1)

14-17: Add documentation for the new keyword

The implementation follows the established pattern of other node-related keywords, but lacks documentation. Please add documentation to specify:

  • Purpose of the keyword
  • Expected format of the config dictionary
  • Example usage

Apply this diff:

 Set node ${node_id} with
+    [Documentation]    Updates node configuration with the provided parameters
+    ...    
+    ...    Arguments:
+    ...    - config: Dictionary containing node configuration parameters (e.g., allowScheduling, evictionRequested)
+    ...    
+    ...    Example:
+    ...    | Set node 1 with | evictionRequested=true | allowScheduling=false |
     [Arguments]    &{config}
     ${node_name} =    get_node_by_index    ${node_id}
     set_node   ${node_name}    &{config}
e2e/tests/regression/test_backing_image.robot (1)

32-34: Consider adding backing image state verification after detachment.

While the test properly waits for volume detachment, it might be valuable to verify the backing image state before proceeding with deletion. This could help catch any potential state inconsistencies.

Consider adding this step after volume detachment:

    And Detach volume 0
    And Wait for volume 0 detached
+   And Verify all disk file status of backing image bi are ready
    And Delete volume 0
e2e/libs/keywords/node_keywords.py (1)

37-40: LGTM! Consider adding docstring documentation.

The implementation is clean and aligns well with the PR's objective of testing volume behavior during node eviction. The method provides the necessary controls for node scheduling and eviction states.

Consider adding a docstring to document the parameters and their effects:

 def set_node(self, node_name, allowScheduling=True, evictionRequested=False):
+    """Configure node scheduling and eviction states.
+    
+    Args:
+        node_name: Name of the node to configure
+        allowScheduling: Whether to allow scheduling on this node
+        evictionRequested: Whether to request node eviction
+    """
     logging(f"Setting node {node_name}; scheduling={allowScheduling}; evictionRequested={evictionRequested}")
     self.node.set_node(node_name, allowScheduling, evictionRequested)
e2e/libs/keywords/network_keywords.py (2)

Line range hint 49-50: Enhance the new method with proper error handling and logging

The new method should include:

  1. Logging for better debugging
  2. Error handling
  3. Timeout parameter
  4. Input validation
  5. Documentation

Consider this implementation:

-    def wait_for_block_network_pod_completed(self, pod_name, status, namespace='default'):
-        wait_for_pod_status(pod_name, status, namespace)
+    def wait_for_block_network_pod_completed(self, pod_name, status, namespace='default', timeout=300):
+        """Wait for a network block pod to reach the specified status.
+
+        Args:
+            pod_name: Name of the pod to wait for
+            status: Expected status of the pod
+            namespace: Kubernetes namespace (default: 'default')
+            timeout: Maximum time to wait in seconds (default: 300)
+
+        Raises:
+            ValueError: If pod_name or status is empty
+            TimeoutError: If pod doesn't reach the desired status within timeout
+        """
+        if not pod_name or not status:
+            raise ValueError("Pod name and status must not be empty")
+
+        logging(f"Waiting for pod {pod_name} to reach status {status} in namespace {namespace}")
+        try:
+            wait_for_pod_status(pod_name, status, namespace, timeout)
+            logging(f"Pod {pod_name} reached status {status}")
+        except Exception as e:
+            logging(f"Failed waiting for pod {pod_name}: {str(e)}")
+            raise

47-47: Remove unnecessary blank line

The blank line between methods is not needed as there's already proper spacing.

-
e2e/tests/regression/test_volume.robot (1)

54-62: Consider enhancing test robustness with additional verifications

While the current implementation covers the basic scenario, consider adding these verifications for better test coverage:

  1. Verify node eviction status after setting evictionRequested=true
  2. Check for the presence of stopped replicas on the evicted node
  3. Add a verification step to ensure the volume's attachment mode is actually "detached" in the backend

Example additions:

    And Wait for node 1 evicted
    And Verify no active replicas on node 1
    And Volume 0 should have attachment mode detached
    And Assert volume 0 remains detached for at least 60 seconds
e2e/tests/negative/stress_cpu.robot (1)

44-44: LGTM! Important reliability improvement.

The addition of Wait for volume 0 detached is a crucial improvement that ensures proper state transitions between volume detachment and attachment operations. This explicit wait helps prevent race conditions and makes the test more reliable, especially important when testing under CPU stress conditions.

This change follows good testing practices by:

  1. Ensuring deterministic state transitions
  2. Preventing potential false positives from premature attachment attempts
  3. Making the test more resilient to timing variations under CPU stress
e2e/tests/negative/stress_filesystem.robot (1)

45-45: Excellent addition of wait step to ensure proper volume state transition!

This change is crucial as it ensures the volume is fully detached before attempting to reattach it. Without this wait step, the test could potentially mask issues (like the one in longhorn#9781) where volumes get stuck in an incorrect state. This addition follows testing best practices by explicitly verifying state transitions and preventing race conditions.

This pattern of explicit state verification should be consistently applied across all volume detachment operations in the test suite to ensure reliable test results.

e2e/tests/regression/test_v2.robot (1)

35-35: Consider adding a dedicated test case for volume detachment during node eviction.

While this change improves the basic volume test, it doesn't fully cover the scenario described in longhorn#9781 where a detached volume remains stuck in the attached state during node eviction. Consider adding a new test case that:

  1. Sets up a 3-node cluster with a volume having 3 replicas
  2. Attaches and then detaches the volume
  3. Evicts a node containing one of the replicas
  4. Verifies the volume properly transitions to detached state

Would you like me to help draft the new test case that covers this scenario?

e2e/tests/regression/test_backup.robot (1)

105-105: LGTM! Essential wait step for reliable volume detachment testing.

This addition improves test reliability by ensuring volume 3 is fully detached before proceeding with PV/PVC creation. The change aligns well with the PR's objective of testing volume detachment behavior, particularly in scenarios where volumes might get stuck in an attached state.

This pattern of explicitly waiting for state transitions should be consistently applied across all volume detachment operations in the test suite to prevent race conditions and flaky tests.

e2e/tests/negative/network_disconnect.robot (2)

167-168: Remove duplicate wait condition.

These two consecutive lines are performing the same wait operation:

-    And Wait for volume 0 detached
-    Then Wait for volume 0 detached
+    Then Wait for volume 0 detached

Line range hint 136-168: Test case doesn't fully cover the reported bug scenario.

The test case "Node Disconnect With Statefulset" doesn't fully address the bug reported in longhorn/longhorn#9781 where a detached volume remains stuck in attached state during node eviction. Consider adding a new test case or extending this one to:

  1. Create a three-node cluster setup
  2. Create a volume with three replicas
  3. Attach and then detach the volume
  4. Evict a node that has a stopped replica
  5. Verify that the volume properly transitions to detached state

This will help ensure the specific bug scenario is properly tested and prevent regression.

Would you like me to help create a new test case that specifically addresses this scenario?

e2e/keywords/volume.resource (1)

122-128: LGTM with suggestions for improvement

The implementation correctly verifies that a volume remains detached for the specified duration. However, consider these enhancements for better robustness and observability:

 Assert volume ${volume_id} remains detached for at least ${period} seconds
     ${volume_name} =    generate_name_with_suffix    volume    ${volume_id}
+    Log    Monitoring volume ${volume_name} detached state for ${period} seconds
     FOR    ${i}    IN RANGE    ${period}
         wait_for_volume_detached    ${volume_name}
+        Log    Volume ${volume_name} remained detached for ${i+1} seconds
         Sleep    1
     END
+    Log    Successfully verified volume ${volume_name} remained detached for ${period} seconds

The suggested improvements:

  1. Add logging statements to track the progress and final state
  2. This will help in debugging test failures by providing clear timeline of events
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a360760 and de9835c.

📒 Files selected for processing (32)
  • e2e/keywords/node.resource (1 hunks)
  • e2e/keywords/volume.resource (1 hunks)
  • e2e/libs/keywords/backing_image_keywords.py (1 hunks)
  • e2e/libs/keywords/backup_keywords.py (1 hunks)
  • e2e/libs/keywords/backupstore_keywords.py (1 hunks)
  • e2e/libs/keywords/common_keywords.py (1 hunks)
  • e2e/libs/keywords/engine_image_keywords.py (1 hunks)
  • e2e/libs/keywords/engine_keywords.py (1 hunks)
  • e2e/libs/keywords/host_keywords.py (1 hunks)
  • e2e/libs/keywords/k8s_keywords.py (2 hunks)
  • e2e/libs/keywords/longhorn_deploy_keywords.py (1 hunks)
  • e2e/libs/keywords/network_keywords.py (1 hunks)
  • e2e/libs/keywords/node_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolume_keywords.py (1 hunks)
  • e2e/libs/keywords/replica_keywords.py (1 hunks)
  • e2e/libs/keywords/setting_keywords.py (1 hunks)
  • e2e/libs/keywords/sharemanager_keywords.py (1 hunks)
  • e2e/libs/keywords/statefulset_keywords.py (0 hunks)
  • e2e/libs/keywords/storageclass_keywords.py (1 hunks)
  • e2e/libs/keywords/workload_keywords.py (1 hunks)
  • e2e/libs/node/node.py (1 hunks)
  • e2e/libs/volume/crd.py (0 hunks)
  • e2e/tests/negative/network_disconnect.robot (1 hunks)
  • e2e/tests/negative/stress_cpu.robot (1 hunks)
  • e2e/tests/negative/stress_filesystem.robot (1 hunks)
  • e2e/tests/negative/stress_memory.robot (1 hunks)
  • e2e/tests/regression/test_backing_image.robot (1 hunks)
  • e2e/tests/regression/test_backup.robot (1 hunks)
  • e2e/tests/regression/test_basic.robot (2 hunks)
  • e2e/tests/regression/test_settings.robot (1 hunks)
  • e2e/tests/regression/test_v2.robot (1 hunks)
  • e2e/tests/regression/test_volume.robot (2 hunks)
💤 Files with no reviewable changes (2)
  • e2e/libs/keywords/statefulset_keywords.py
  • e2e/libs/volume/crd.py
✅ Files skipped from review due to trivial changes (7)
  • e2e/libs/keywords/backupstore_keywords.py
  • e2e/libs/keywords/engine_image_keywords.py
  • e2e/libs/keywords/engine_keywords.py
  • e2e/libs/keywords/host_keywords.py
  • e2e/libs/keywords/k8s_keywords.py
  • e2e/libs/keywords/longhorn_deploy_keywords.py
  • e2e/libs/keywords/sharemanager_keywords.py
🧰 Additional context used
🪛 Ruff
e2e/libs/keywords/workload_keywords.py

5-5: Redefinition of unused Node from line 4

(F811)

🔇 Additional comments (24)
e2e/libs/keywords/persistentvolume_keywords.py (2)

1-4: LGTM: Clean import organization

The imports are well-organized and the logging utility is appropriately imported from the common utility module.


Line range hint 1-13: Verify integration with node eviction test scenario

The PV deletion functionality appears to be a supporting operation for the node eviction test case. Please ensure that this implementation adequately supports the test scenario where volumes remain stuck in attached state during node eviction.

✅ Verification successful

Let me gather more information about the test implementation to verify the integration.


Let me check one more aspect to understand how the PV deletion is used in the actual test cleanup.


PV deletion functionality correctly supports node eviction test scenarios

The implementation aligns well with the node eviction test requirements:

  • The PV deletion operation is used as part of the test infrastructure through csi_pv fixture
  • The test scenarios validate volume behavior during node eviction, including:
    • Basic node eviction with replica rescheduling
    • Eviction with no schedulable nodes
    • Eviction with soft anti-affinity
    • Multiple volume eviction cases

The logging addition in the PV deletion operation will help track volume cleanup during these test scenarios.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for related test files that use this keyword
rg -l "delete_persistentvolume" --type py

# Look for the actual test implementation
rg "test.*node.*eviction" --type py

Length of output: 862


Script:

#!/bin/bash
# Check the test implementation details
rg "test_node_eviction" -A 10 manager/integration/tests/test_node.py

# Check how PV deletion is used in the test
rg "delete_persistentvolume" -A 5 -B 5 e2e/libs/keywords/persistentvolumeclaim_keywords.py

Length of output: 3298


Script:

#!/bin/bash
# Check test cleanup and PV usage
rg "def cleanup|def setup" -A 5 manager/integration/tests/test_node.py

# Check PV creation in the test
rg "csi_pv" -A 5 -B 5 manager/integration/tests/test_node.py

Length of output: 4648

e2e/libs/keywords/setting_keywords.py (1)

Line range hint 6-17: Add method documentation and error handling

The methods would benefit from:

  1. Type hints and docstrings to document their purpose and parameters
  2. Error handling for potential exceptions from the Setting instance
     def __init__(self):
+        """Initialize the settings keyword implementation."""
         self.setting = Setting()

-    def update_setting(self, key, value, retry=True):
-        self.setting.update_setting(key, value, retry)
+    def update_setting(self, key: str, value: str, retry: bool = True) -> None:
+        """
+        Update a Longhorn setting with the specified key and value.
+
+        Args:
+            key: The setting key to update
+            value: The new value to set
+            retry: Whether to retry the update on failure
+        """
+        try:
+            self.setting.update_setting(key, value, retry)
+        except Exception as e:
+            raise RuntimeError(f"Failed to update setting {key}: {str(e)}")

Let's verify how these settings are used in the volume reattachment test:

e2e/libs/keywords/common_keywords.py (1)

3-5: LGTM! Well-organized imports

The new imports are properly organized with clear separation, following Python conventions.

e2e/keywords/node.resource (2)

14-17: Implementation pattern looks good

The implementation:

  • Follows the established pattern of other node-related keywords
  • Reuses existing functions appropriately
  • Maintains consistency with the codebase

14-17: Consider adding parameter validation

The keyword accepts a config dictionary without validation. Consider adding validation for:

  1. Required config parameters
  2. Parameter types
  3. Valid value ranges

Let's check what parameters are supported by looking at the node configuration code:

e2e/tests/regression/test_backing_image.robot (1)

33-33: LGTM! Important addition for test reliability.

The added wait step ensures proper volume detachment verification before proceeding with deletion, which is crucial for test reliability and aligns with the PR's focus on volume detachment behavior.

Let's verify this is a consistent pattern across other test files:

✅ Verification successful

Consistent pattern confirmed: Wait steps are properly used before volume operations

The codebase analysis shows a clear and consistent pattern where:

  1. Wait steps are used before volume deletion operations (Wait for volume X detached)
  2. Wait steps are used after replica deletion for rebuilding (Wait until volume X replica rebuilding)

The added wait step in test_backing_image.robot follows this established pattern, as seen in multiple test files:

  • test_v2.robot: "Wait for volume 0 detached" before deletion
  • test_basic.robot: "Wait for volume 0 detached" before deletion
  • Similar patterns in test_backup_listing.robot
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent usage of "Wait for volume" steps before deletion
# Expected: Find similar patterns in other test files to confirm this is a standard practice

# Search for volume deletion patterns in robot files
rg -A 2 "Delete volume" --type robot

# Search for wait-before-delete patterns
rg -B 2 "Delete volume" --type robot

Length of output: 13365

e2e/libs/keywords/backing_image_keywords.py (2)

26-26: LGTM!

The added empty line improves code readability by separating method groups.


28-29: LGTM! Method follows established patterns.

The new method wait_all_backing_image_managers_running follows the class's consistent delegation pattern and will be useful for ensuring backing image stability during volume operations, which is particularly relevant for the volume detachment testing scenario being implemented.

Let's verify the implementation of the delegated method:

✅ Verification successful

Implementation verified and follows expected patterns

The implementation in both classes is correct and follows good practices:

  • The keyword class properly delegates to the CRD class
  • The CRD implementation correctly:
    • Lists all backing image managers
    • Checks their states in a loop with retry logic
    • Verifies each manager is in "running" state
    • Uses appropriate logging
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the implementation of wait_all_backing_image_managers_running in the BackingImage class

# Search for the method implementation in the backing_image.py file
fd --type f "backing_image.py" --exec rg -A 5 "wait_all_backing_image_managers_running"

Length of output: 460


Script:

#!/bin/bash
# Let's verify the implementation in the CRD class
fd --type f "crd.py" --exec rg -A 10 "wait_all_backing_image_managers_running"

Length of output: 721

e2e/libs/keywords/node_keywords.py (1)

37-40: Verify Node class implementation of set_node method.

Let's verify that the Node class properly implements the set_node method with eviction support.

✅ Verification successful

Node class implementation properly handles eviction support

The Node class in e2e/libs/node/node.py correctly implements the set_node method with proper handling of both allowScheduling and evictionRequested parameters:

  • Updates node settings via Longhorn client with both parameters
  • Verifies the changes through assertions
  • Maintains parameter consistency with the calling code
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify Node class implementation of set_node method
# Expected: Find the implementation showing proper handling of evictionRequested parameter

# Search for Node class implementation
ast-grep --pattern 'class Node {
  $$$
  set_node($$$) {
    $$$
  }
  $$$
}'

# Backup: Search using ripgrep if ast-grep doesn't find it
rg -A 10 'def set_node.*node_name.*allowScheduling.*evictionRequested'

Length of output: 1825

e2e/libs/keywords/network_keywords.py (1)

Line range hint 49-50: Verify integration with test framework

Since this method is part of a test case for volume detachment during node eviction (longhorn/longhorn#9781), let's verify its integration with the test framework.

✅ Verification successful

Integration with test framework is properly implemented

The verification shows that:

  • wait_for_pod_status is a well-established utility function from workload.pod module, used across multiple core components
  • network_keywords.py is correctly integrated as a Robot Framework keyword library, providing network-related test functionality
  • The method wait_for_block_network_pod_completed is properly implemented as a wrapper around wait_for_pod_status, maintaining consistent behavior with other network operations in the test framework
  • The implementation follows the same pattern as other network-related operations in the class, such as drop_pod_egress_traffic and disconnect_node_network
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of wait_for_block_network_pod_completed in test files
# Expected: The method should be used in test cases related to volume detachment during node eviction

# Search for test files using this keyword
echo "Test files using wait_for_block_network_pod_completed:"
rg -l "wait_for_block_network_pod_completed" --type robot

# Search for related test scenarios
echo -e "\nRelated test scenarios:"
rg "Test Volume.*[Dd]etach.*[Ee]vict" --type robot

Length of output: 292


Script:

#!/bin/bash
# Let's verify the integration from multiple angles

# 1. Check if wait_for_pod_status is defined and its usage
echo "=== Checking wait_for_pod_status definition and usage ==="
rg "wait_for_pod_status" -A 3

# 2. Check for network-related test files and keywords
echo -e "\n=== Checking network-related test files ==="
fd "network.*\.py|network.*\.robot" 

# 3. Look for test cases involving pod status checks
echo -e "\n=== Checking test cases with pod status verification ==="
rg "Test.*Pod.*Status|Test.*Node.*Eviction" --type robot

# 4. Check the entire network_keywords.py file for context
echo -e "\n=== Content of network_keywords.py ==="
fd "network_keywords.py" --exec cat {}

Length of output: 5668

e2e/tests/regression/test_volume.robot (2)

12-13: LGTM: Resource imports are appropriate and well-organized

The addition of node.resource and volume.resource is necessary for the new test case and follows the existing import pattern.


48-53: LGTM: Test case properly implements reproduction steps for issue #9781

The test case is well-structured with appropriate tags and documentation reference.

e2e/libs/keywords/backup_keywords.py (2)

2-2: LGTM! Clean import addition

The import statement for logging functionality is properly placed and grouped with other utility imports.


Line range hint 43-44: LGTM! Good logging enhancement

The added logging statement improves test observability by clearly indicating which volume and backup are being verified. This will be helpful for debugging test failures, especially in the context of volume attachment issues being tested in this PR.

e2e/tests/negative/stress_memory.robot (1)

46-46: LGTM! Good addition of explicit wait state.

The addition of Wait for volume 0 detached step improves test reliability by ensuring proper state transition before proceeding with the attachment. This change:

  • Prevents potential race conditions
  • Makes the test more deterministic
  • Aligns with similar changes in other test files
  • Follows testing best practices by explicitly waiting for state changes
e2e/tests/regression/test_v2.robot (1)

35-35: LGTM! The added wait step improves test reliability.

Adding an explicit wait for volume detachment before deletion is a good practice as it ensures the volume has completely transitioned to the detached state.

e2e/tests/regression/test_settings.robot (1)

67-67: LGTM! The added verification step improves test reliability.

The addition of Wait for volume 1 detached is a good practice as it ensures the volume detachment operation completes before proceeding with subsequent steps, preventing potential race conditions.

Let's verify this pattern is consistently applied across other test files:

✅ Verification successful

The added verification step follows the established pattern across the test suite

The search results confirm that the Wait for volume X detached verification step is consistently used after volume detachment operations throughout the test suite, appearing in multiple test files including:

  • Regression tests (test_volume.robot, test_v2.robot, test_basic.robot)
  • Negative tests (stress_memory.robot, node_drain.robot, network_disconnect.robot)
  • Other related tests (test_backup.robot, test_backing_image.robot)

This verifies that the added line follows the established best practice for ensuring reliable volume detachment verification across the test suite.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if similar volume detachment verification steps are present in other test files
# Expected: Find similar patterns in other test files as mentioned in the AI summary

rg -A 2 "Detach volume" "e2e/tests/**/*.robot" | rg -B 2 "Wait for volume.*detached"

Length of output: 147


Script:

#!/bin/bash
# Let's try a different approach to find the pattern
# First, let's locate all robot test files
fd ".robot$" e2e/tests/

# Then search for both patterns separately to understand the context better
echo "=== Searching for 'Detach volume' patterns ==="
rg "Detach volume" e2e/tests/

echo -e "\n=== Searching for 'Wait for volume.*detached' patterns ==="
rg "Wait for volume.*detached" e2e/tests/

Length of output: 3818

e2e/tests/regression/test_basic.robot (4)

56-56: LGTM! Good addition for test reliability.

Adding a wait condition after volume detachment prevents potential race conditions and ensures the volume is fully detached before proceeding with deletion.


84-84: LGTM! Essential wait condition before maintenance mode.

Adding this wait condition is crucial before attaching the volume in maintenance mode, as it ensures the previous detachment operation is complete.


90-90: LGTM! Consistent wait condition after detachment.

This wait condition maintains consistency with other detachment operations and ensures clean state transitions in the snapshot reversion workflow.


Line range hint 1-1: Verify the implementation of the node eviction test case.

While the added wait conditions improve test reliability, I notice that the test case mentioned in the PR objectives (testing volume reattachment after node eviction) is not implemented in this file. This test case is crucial for reproducing the issue described in longhorn/longhorn#9781.

Let's check if the test case exists in other test files:

e2e/libs/node/node.py (1)

132-143: Verify the method's usage in test scenarios.

Since this method is crucial for testing volume detachment during node eviction (per PR objectives), we should verify its integration with test cases.

✅ Verification successful

Let me search for actual test cases using this functionality to verify its integration.


Let me check the actual test files that were found to see their eviction-related test cases.


Based on the comprehensive test coverage found, I can now provide the final response:

The set_node method is well integrated with test scenarios

The method is thoroughly tested across multiple test cases that verify node eviction functionality:

  • test_node_eviction: Tests basic node eviction scenario with replica migration
  • test_node_eviction_no_schedulable_node: Verifies behavior when no schedulable nodes are available
  • test_node_eviction_soft_anti_affinity: Tests eviction with anti-affinity settings
  • test_node_eviction_multiple_volume: Validates eviction behavior with multiple volumes
  • test_drain_with_block_for_eviction_success: Tests node draining with eviction policy

The implementation is correctly used to control node scheduling and eviction states, with proper assertions verifying the expected behavior.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find test files using this method and verify its usage context

# Search for test files using this method
echo "Test files using set_node:"
rg -l "set_node.*allowScheduling.*evictionRequested" --type py

# Show the usage context
echo -e "\nUsage context:"
rg -A 5 -B 5 "set_node.*allowScheduling.*evictionRequested" --type py

Length of output: 2803


Script:

#!/bin/bash
# Search for test files that use node eviction functionality
echo "Test files with eviction tests:"
rg -l "test.*eviction" --type py

# Show the test context
echo -e "\nTest context:"
rg -B 10 -A 10 "test.*eviction.*def" --type py

# Search for Robot Framework test cases using the set_node keyword
echo -e "\nRobot Framework test cases:"
rg -l "Set Node.*scheduling.*eviction" --type robot

Length of output: 411


Script:

#!/bin/bash
# Show the eviction-related test cases from the identified files
echo "Eviction test cases in test_node.py:"
rg -B 5 -A 15 "eviction" manager/integration/tests/test_node.py

echo -e "\nEviction test cases in test_backing_image.py:"
rg -B 5 -A 15 "eviction" manager/integration/tests/test_backing_image.py

# Also check for any e2e test files that might use this functionality
echo -e "\nE2E tests using node eviction:"
fd -e robot . e2e/suites/ -x grep -l "eviction" {} \;

Length of output: 30157

e2e/keywords/volume.resource (1)

122-128: Verification: Keyword aligns with PR objectives

This keyword effectively supports testing the issue described in longhorn/longhorn#9781 by providing a reliable way to verify that a volume remains detached after node eviction. The implementation allows for sufficient observation time to detect any unexpected reattachment.

To verify the keyword's usage in test cases:

✅ Verification successful

Keyword implementation and usage verified as correct

The keyword is correctly implemented and properly used in the test case to verify that a volume remains detached after node eviction, which directly aligns with issue #9781. The test case demonstrates the exact scenario where it:

  1. Detaches a volume
  2. Triggers node eviction
  3. Verifies the volume stays detached for 60 seconds
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find test cases using this keyword to ensure proper integration
rg -l "Assert volume .+ remains detached for at least" --type robot

Length of output: 108


Script:

#!/bin/bash
# Check the usage context in the test file
rg -B 5 -A 5 "Assert volume .+ remains detached for at least" e2e/tests/regression/test_volume.robot

Length of output: 305

e2e/libs/node/node.py Outdated Show resolved Hide resolved
@c3y1huang c3y1huang force-pushed the 9781-volume-stuck-attached-during-node-eviction branch from de9835c to a436709 Compare November 15, 2024 01:06
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
e2e/tests/regression/test_volume.robot (1)

54-62: Consider enhancing the test robustness

While the test implementation is correct, consider adding these verifications for better robustness:

  1. Verify the volume has all replicas healthy before detachment
  2. Verify the node status after setting eviction
 Given Create volume 0 with    dataEngine=${DATA_ENGINE}
 And Attach volume 0
 And Wait for volume 0 healthy
+And Wait for volume 0 replica count 3
+And Wait for volume 0 all replicas healthy

 When Detach volume 0
 And Set node 1 with    allowScheduling=false    evictionRequested=true
+And Wait for node 1 eviction requested
 
 Then Wait for volume 0 detached
 And Assert volume 0 remains detached for at least 60 seconds
e2e/libs/node/node.py (1)

132-153: Consider enhancing error handling.

The current error handling could be improved to provide more context about failures and handle specific exceptions.

Consider applying this enhancement:

     def set_node(self, node_name: str, allowScheduling: bool, evictionRequested: bool) -> object:
         for _ in range(self.retry_count):
             try:
                 node = get_longhorn_client().by_id_node(node_name)
                 get_longhorn_client().update(node,
                     allowScheduling=allowScheduling,
                     evictionRequested=evictionRequested)
 
                 node = get_longhorn_client().by_id_node(node_name)
                 assert node.allowScheduling == allowScheduling
                 assert node.evictionRequested == evictionRequested
                 return node
-            except Exception as e:
-                logging(f"Updating node {node_name} error: {e}")
+            except AssertionError as e:
+                logging(f"Node {node_name} property verification failed: {e}")
+            except Exception as e:
+                logging(f"Failed to update node {node_name}: {e}")
+                if "not found" in str(e):
+                    raise  # Don't retry if node doesn't exist
             time.sleep(self.retry_interval)
         raise AssertionError(f"Updating node {node_name} failed")
🧰 Tools
🪛 Ruff

153-153: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between de9835c and a436709.

📒 Files selected for processing (5)
  • e2e/keywords/node.resource (1 hunks)
  • e2e/keywords/volume.resource (1 hunks)
  • e2e/libs/keywords/node_keywords.py (2 hunks)
  • e2e/libs/node/node.py (1 hunks)
  • e2e/tests/regression/test_volume.robot (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • e2e/keywords/node.resource
  • e2e/keywords/volume.resource
  • e2e/libs/keywords/node_keywords.py
🧰 Additional context used
🪛 Ruff
e2e/libs/node/node.py

153-153: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

🔇 Additional comments (3)
e2e/tests/regression/test_volume.robot (2)

12-13: LGTM: Resource additions are well-organized

The new resources for node and volume management are appropriately placed and follow the existing resource organization pattern.


48-53: LGTM: Well-structured test case documentation

The test case is properly documented with clear tags and references the correct issue. The documentation follows best practices.

e2e/libs/node/node.py (1)

132-153: Verify the impact on test cases.

Since this is a new method in the E2E testing framework, we should verify its usage in test cases.

✅ Verification successful

Let me gather more context about the test files and their relationships.


Let me search for Robot Framework test files using a different pattern since they typically use space-separated keywords rather than dot notation.


Let me check the content of the node.resource file and the test files that import it to understand how the node keywords are being used.


Based on the gathered information, I can now provide the final response:

The new set_node method is properly integrated and used in the test framework.

The method is part of a well-structured test framework where:

  • It's implemented in e2e/libs/node/node.py as the core functionality
  • It's wrapped in e2e/libs/keywords/node_keywords.py for Robot Framework integration
  • It's used through the node.resource import in multiple test files:
    • e2e/tests/negative/node_drain.robot
    • e2e/tests/regression/test_v2.robot
    • e2e/tests/regression/test_volume.robot
    • e2e/tests/regression/test_basic.robot

The test files, particularly node_drain.robot, contain comprehensive test cases for node operations including draining, scheduling, and eviction scenarios, which would exercise this new method.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find test cases using the new set_node method
# Expected: Find Robot Framework test files that use this keyword

# Search for direct usage of set_node in Robot Framework test files
echo "Searching for direct usage in test files..."
rg -t robot "Set Node.*With" .

# Search for Python test files that might use this method
echo "Searching for usage in Python test files..."
rg "set_node\(" -t py .

Length of output: 720


Script:

#!/bin/bash
# Check the content of node_keywords.py to understand the usage
echo "Content of node_keywords.py:"
cat e2e/libs/keywords/node_keywords.py

# Check for Robot Framework test files that might use node keywords
echo -e "\nSearching for Robot test files using node keywords:"
rg -t robot "Node\." .

Length of output: 2138


Script:

#!/bin/bash
# Search for Robot Framework test files using node keywords without dot notation
echo "Searching for Robot test files using node keywords:"
rg -t robot "Set Node|Disable Node|Enable Node|Reset Node" .

# Also check for any Robot Framework test files that might import node keywords
echo -e "\nSearching for Robot test files importing node keywords:"
rg -t robot "Resource.*node" .

Length of output: 721


Script:

#!/bin/bash
# Check the node.resource file
echo "Content of node.resource:"
cat e2e/tests/keywords/node.resource

# Check test files that use node keywords
echo -e "\nContent of node_drain.robot:"
cat e2e/tests/negative/node_drain.robot

echo -e "\nContent of test_basic.robot:"
cat e2e/tests/regression/test_basic.robot

Length of output: 19338

🧰 Tools
🪛 Ruff

153-153: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

e2e/libs/node/node.py Outdated Show resolved Hide resolved
@c3y1huang c3y1huang force-pushed the 9781-volume-stuck-attached-during-node-eviction branch 2 times, most recently from f9a5a3f to 3d0a732 Compare November 15, 2024 01:41
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f9a5a3f and 3d0a732.

📒 Files selected for processing (32)
  • e2e/keywords/node.resource (1 hunks)
  • e2e/keywords/volume.resource (1 hunks)
  • e2e/libs/keywords/backing_image_keywords.py (1 hunks)
  • e2e/libs/keywords/backup_keywords.py (1 hunks)
  • e2e/libs/keywords/backupstore_keywords.py (1 hunks)
  • e2e/libs/keywords/common_keywords.py (1 hunks)
  • e2e/libs/keywords/engine_image_keywords.py (1 hunks)
  • e2e/libs/keywords/engine_keywords.py (1 hunks)
  • e2e/libs/keywords/host_keywords.py (1 hunks)
  • e2e/libs/keywords/k8s_keywords.py (2 hunks)
  • e2e/libs/keywords/longhorn_deploy_keywords.py (1 hunks)
  • e2e/libs/keywords/network_keywords.py (1 hunks)
  • e2e/libs/keywords/node_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolume_keywords.py (1 hunks)
  • e2e/libs/keywords/replica_keywords.py (1 hunks)
  • e2e/libs/keywords/setting_keywords.py (1 hunks)
  • e2e/libs/keywords/sharemanager_keywords.py (1 hunks)
  • e2e/libs/keywords/statefulset_keywords.py (0 hunks)
  • e2e/libs/keywords/storageclass_keywords.py (1 hunks)
  • e2e/libs/keywords/workload_keywords.py (1 hunks)
  • e2e/libs/node/node.py (1 hunks)
  • e2e/libs/volume/crd.py (0 hunks)
  • e2e/tests/negative/network_disconnect.robot (1 hunks)
  • e2e/tests/negative/stress_cpu.robot (1 hunks)
  • e2e/tests/negative/stress_filesystem.robot (1 hunks)
  • e2e/tests/negative/stress_memory.robot (1 hunks)
  • e2e/tests/regression/test_backing_image.robot (1 hunks)
  • e2e/tests/regression/test_backup.robot (1 hunks)
  • e2e/tests/regression/test_basic.robot (2 hunks)
  • e2e/tests/regression/test_settings.robot (1 hunks)
  • e2e/tests/regression/test_v2.robot (1 hunks)
  • e2e/tests/regression/test_volume.robot (2 hunks)
💤 Files with no reviewable changes (2)
  • e2e/libs/keywords/statefulset_keywords.py
  • e2e/libs/volume/crd.py
🚧 Files skipped from review as they are similar to previous changes (29)
  • e2e/keywords/node.resource
  • e2e/keywords/volume.resource
  • e2e/libs/keywords/backing_image_keywords.py
  • e2e/libs/keywords/backup_keywords.py
  • e2e/libs/keywords/backupstore_keywords.py
  • e2e/libs/keywords/common_keywords.py
  • e2e/libs/keywords/engine_image_keywords.py
  • e2e/libs/keywords/engine_keywords.py
  • e2e/libs/keywords/host_keywords.py
  • e2e/libs/keywords/k8s_keywords.py
  • e2e/libs/keywords/longhorn_deploy_keywords.py
  • e2e/libs/keywords/network_keywords.py
  • e2e/libs/keywords/node_keywords.py
  • e2e/libs/keywords/persistentvolume_keywords.py
  • e2e/libs/keywords/replica_keywords.py
  • e2e/libs/keywords/setting_keywords.py
  • e2e/libs/keywords/sharemanager_keywords.py
  • e2e/libs/keywords/storageclass_keywords.py
  • e2e/libs/node/node.py
  • e2e/tests/negative/network_disconnect.robot
  • e2e/tests/negative/stress_cpu.robot
  • e2e/tests/negative/stress_filesystem.robot
  • e2e/tests/negative/stress_memory.robot
  • e2e/tests/regression/test_backing_image.robot
  • e2e/tests/regression/test_backup.robot
  • e2e/tests/regression/test_basic.robot
  • e2e/tests/regression/test_settings.robot
  • e2e/tests/regression/test_v2.robot
  • e2e/tests/regression/test_volume.robot
🧰 Additional context used
🪛 Ruff
e2e/libs/keywords/workload_keywords.py

5-5: Redefinition of unused Node from line 4

(F811)

e2e/libs/keywords/workload_keywords.py Outdated Show resolved Hide resolved
Signed-off-by: Chin-Ya Huang <[email protected]>
@c3y1huang c3y1huang force-pushed the 9781-volume-stuck-attached-during-node-eviction branch from 3d0a732 to a132a4e Compare November 15, 2024 01:46
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3d0a732 and a132a4e.

📒 Files selected for processing (32)
  • e2e/keywords/node.resource (1 hunks)
  • e2e/keywords/volume.resource (1 hunks)
  • e2e/libs/keywords/backing_image_keywords.py (1 hunks)
  • e2e/libs/keywords/backup_keywords.py (1 hunks)
  • e2e/libs/keywords/backupstore_keywords.py (1 hunks)
  • e2e/libs/keywords/common_keywords.py (1 hunks)
  • e2e/libs/keywords/engine_image_keywords.py (1 hunks)
  • e2e/libs/keywords/engine_keywords.py (1 hunks)
  • e2e/libs/keywords/host_keywords.py (1 hunks)
  • e2e/libs/keywords/k8s_keywords.py (2 hunks)
  • e2e/libs/keywords/longhorn_deploy_keywords.py (1 hunks)
  • e2e/libs/keywords/network_keywords.py (1 hunks)
  • e2e/libs/keywords/node_keywords.py (2 hunks)
  • e2e/libs/keywords/persistentvolume_keywords.py (1 hunks)
  • e2e/libs/keywords/replica_keywords.py (1 hunks)
  • e2e/libs/keywords/setting_keywords.py (1 hunks)
  • e2e/libs/keywords/sharemanager_keywords.py (1 hunks)
  • e2e/libs/keywords/statefulset_keywords.py (0 hunks)
  • e2e/libs/keywords/storageclass_keywords.py (1 hunks)
  • e2e/libs/keywords/workload_keywords.py (0 hunks)
  • e2e/libs/node/node.py (1 hunks)
  • e2e/libs/volume/crd.py (0 hunks)
  • e2e/tests/negative/network_disconnect.robot (1 hunks)
  • e2e/tests/negative/stress_cpu.robot (1 hunks)
  • e2e/tests/negative/stress_filesystem.robot (1 hunks)
  • e2e/tests/negative/stress_memory.robot (1 hunks)
  • e2e/tests/regression/test_backing_image.robot (1 hunks)
  • e2e/tests/regression/test_backup.robot (1 hunks)
  • e2e/tests/regression/test_basic.robot (2 hunks)
  • e2e/tests/regression/test_settings.robot (1 hunks)
  • e2e/tests/regression/test_v2.robot (1 hunks)
  • e2e/tests/regression/test_volume.robot (2 hunks)
💤 Files with no reviewable changes (3)
  • e2e/libs/keywords/statefulset_keywords.py
  • e2e/libs/keywords/workload_keywords.py
  • e2e/libs/volume/crd.py
🚧 Files skipped from review as they are similar to previous changes (28)
  • e2e/keywords/node.resource
  • e2e/keywords/volume.resource
  • e2e/libs/keywords/backing_image_keywords.py
  • e2e/libs/keywords/backup_keywords.py
  • e2e/libs/keywords/backupstore_keywords.py
  • e2e/libs/keywords/common_keywords.py
  • e2e/libs/keywords/engine_image_keywords.py
  • e2e/libs/keywords/engine_keywords.py
  • e2e/libs/keywords/host_keywords.py
  • e2e/libs/keywords/k8s_keywords.py
  • e2e/libs/keywords/longhorn_deploy_keywords.py
  • e2e/libs/keywords/network_keywords.py
  • e2e/libs/keywords/node_keywords.py
  • e2e/libs/keywords/persistentvolume_keywords.py
  • e2e/libs/keywords/replica_keywords.py
  • e2e/libs/keywords/setting_keywords.py
  • e2e/libs/keywords/sharemanager_keywords.py
  • e2e/libs/keywords/storageclass_keywords.py
  • e2e/tests/negative/network_disconnect.robot
  • e2e/tests/negative/stress_cpu.robot
  • e2e/tests/negative/stress_filesystem.robot
  • e2e/tests/negative/stress_memory.robot
  • e2e/tests/regression/test_backing_image.robot
  • e2e/tests/regression/test_backup.robot
  • e2e/tests/regression/test_basic.robot
  • e2e/tests/regression/test_settings.robot
  • e2e/tests/regression/test_v2.robot
  • e2e/tests/regression/test_volume.robot
🔇 Additional comments (1)
e2e/libs/node/node.py (1)

131-153: Implementation aligns well with PR objectives.

The method provides reliable functionality for setting node eviction state, which is crucial for testing volume detachment during node eviction scenarios. The retry mechanism and validation checks ensure test reliability.

e2e/libs/node/node.py Outdated Show resolved Hide resolved
@c3y1huang c3y1huang force-pushed the 9781-volume-stuck-attached-during-node-eviction branch from a132a4e to a6b63ea Compare November 15, 2024 02:04
Copy link
Contributor

@chriscchien chriscchien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants