diff --git a/index.html b/index.html index 73c3c900fc..e742a0dc23 100644 --- a/index.html +++ b/index.html @@ -1,7 +1,7 @@ - + Longhorn Manual Test Cases diff --git a/index.xml b/index.xml index f99479cab3..fa323798b6 100644 --- a/index.xml +++ b/index.xml @@ -26,13 +26,7 @@ <link>https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/restore-volume-node-down/</link> <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate> <guid>https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/restore-volume-node-down/</guid> - <description>Case 1: Create a backup. -Restore the above backup. -Power off the volume attached node during the restoring. -Wait for the Longhorn node down. -Wait for the restore volume being reattached and starting restoring volume with state Degraded. -Wait for the restore complete. -Note: During the restoration process, if the engine process fails to communicate with a replica, all replicas will be marked as ERR, and the volume&rsquo;s RestoreRequired status cannot be set to false.</description> + <description>Case 1: Create a backup. Restore the above backup. Power off the volume attached node during the restoring. Wait for the Longhorn node down. Wait for the restore volume being reattached and starting restoring volume with state Degraded. Wait for the restore complete. Note: During the restoration process, if the engine process fails to communicate with a replica, all replicas will be marked as ERR, and the volume&rsquo;s RestoreRequired status cannot be set to false.</description> </item> <item> <title>[#1366](https://github.com/longhorn/longhorn/issues/1366) && [#1328](https://github.com/longhorn/longhorn/issues/1328) The node the DR volume attached to is rebooted @@ -53,9 +47,7 @@ Note: During the restoration process, if the engine process fails to communicate https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/simulated-slow-disk/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/simulated-slow-disk/ - This case requires the creation of a slow virtual disk with dmsetup. -Make a slow disk: -Make a disk image file: truncate -s 10g slow.img Create a loopback device: losetup --show -P -f slow.img Get the block size of the loopback device: blockdev --getsize /dev/loopX Create slow device: echo &quot;0 &lt;blocksize&gt; delay /dev/loopX 0 500&quot; | dmsetup create dm-slow Format slow device: mkfs.ext4 /dev/mapper/dm-slow Mount slow device: mount /dev/mapper/dm-slow /mnt Build longhorn-engine and run it on the slow disk. + This case requires the creation of a slow virtual disk with dmsetup. Make a slow disk: Make a disk image file: truncate -s 10g slow.img Create a loopback device: losetup --show -P -f slow.img Get the block size of the loopback device: blockdev --getsize /dev/loopX Create slow device: echo &quot;0 &lt;blocksize&gt; delay /dev/loopX 0 500&quot; | dmsetup create dm-slow Format slow device: mkfs.ext4 /dev/mapper/dm-slow Mount slow device: mount /dev/mapper/dm-slow /mnt Build longhorn-engine and run it on the slow disk. [#4637](https://github.com/longhorn/longhorn/issues/4637) pull backup created by another Longhorn system @@ -76,11 +68,7 @@ Make a disk image file: truncate -s 10g slow.img Create a loopback device: loset https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/aks/expand-volume/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/aks/expand-volume/ - Create AKS cluster with 3 nodes and install Longhorn. -Create deployment and write some data to it. -In Longhorn, set replica-replenishment-wait-interval to 0. -Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool. -AKS_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; AKS_RESOURCE_GROUP=&lt;aks-resource-group&gt; AKS_CLUSTER_NAME=&lt;aks-cluster-name&gt; AKS_DISK_SIZE_NEW=&lt;new-disk-size-in-gb&gt; AKS_NODE_NUM=&lt;number-of-nodes&gt; AKS_K8S_VERSION=&lt;kubernetes-version&gt; az aks nodepool add \ --resource-group ${AKS_RESOURCE_GROUP} \ --cluster-name ${AKS_CLUSTER_NAME} \ --name ${AKS_NODEPOOL_NAME_NEW} \ --node-count ${AKS_NODE_NUM} \ --node-osdisk-size ${AKS_DISK_SIZE_NEW} \ --kubernetes-version ${AKS_K8S_VERSION} \ --mode System Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. + Create AKS cluster with 3 nodes and install Longhorn. Create deployment and write some data to it. In Longhorn, set replica-replenishment-wait-interval to 0. Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool. AKS_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; AKS_RESOURCE_GROUP=&lt;aks-resource-group&gt; AKS_CLUSTER_NAME=&lt;aks-cluster-name&gt; AKS_DISK_SIZE_NEW=&lt;new-disk-size-in-gb&gt; AKS_NODE_NUM=&lt;number-of-nodes&gt; AKS_K8S_VERSION=&lt;kubernetes-version&gt; az aks nodepool add \ --resource-group ${AKS_RESOURCE_GROUP} \ --cluster-name ${AKS_CLUSTER_NAME} \ --name ${AKS_NODEPOOL_NAME_NEW} \ --node-count ${AKS_NODE_NUM} \ --node-osdisk-size ${AKS_DISK_SIZE_NEW} \ --kubernetes-version ${AKS_K8S_VERSION} \ --mode System Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. [Expand Volume](https://longhorn.io/docs/1.3.0/advanced-resources/support-managed-k8s-service/manage-node-group-on-eks/#storage-expansion) @@ -94,23 +82,14 @@ AKS_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; AKS_RESOURCE_GROUP=& https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/gke/expand-volume/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/gke/expand-volume/ - Create GKE cluster with 3 nodes and install Longhorn. -Create deployment and write some data to it. -In Longhorn, set replica-replenishment-wait-interval to 0. -Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool. -GKE_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; GKE_REGION=&lt;gke-region&gt; GKE_CLUSTER_NAME=&lt;gke-cluster-name&gt; GKE_IMAGE_TYPE=Ubuntu GKE_MACHINE_TYPE=&lt;gcp-machine-type&gt; GKE_DISK_SIZE_NEW=&lt;new-disk-size-in-gb&gt; GKE_NODE_NUM=&lt;number-of-nodes&gt; gcloud container node-pools create ${GKE_NODEPOOL_NAME_NEW} \ --region ${GKE_REGION} \ --cluster ${GKE_CLUSTER_NAME} \ --image-type ${GKE_IMAGE_TYPE} \ --machine-type ${GKE_MACHINE_TYPE} \ --disk-size ${GKE_DISK_SIZE_NEW} \ --num-nodes ${GKE_NODE_NUM} gcloud container node-pools list \ --zone ${GKE_REGION} \ --cluster ${GKE_CLUSTER_NAME} Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. + Create GKE cluster with 3 nodes and install Longhorn. Create deployment and write some data to it. In Longhorn, set replica-replenishment-wait-interval to 0. Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool. GKE_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; GKE_REGION=&lt;gke-region&gt; GKE_CLUSTER_NAME=&lt;gke-cluster-name&gt; GKE_IMAGE_TYPE=Ubuntu GKE_MACHINE_TYPE=&lt;gcp-machine-type&gt; GKE_DISK_SIZE_NEW=&lt;new-disk-size-in-gb&gt; GKE_NODE_NUM=&lt;number-of-nodes&gt; gcloud container node-pools create ${GKE_NODEPOOL_NAME_NEW} \ --region ${GKE_REGION} \ --cluster ${GKE_CLUSTER_NAME} \ --image-type ${GKE_IMAGE_TYPE} \ --machine-type ${GKE_MACHINE_TYPE} \ --disk-size ${GKE_DISK_SIZE_NEW} \ --num-nodes ${GKE_NODE_NUM} gcloud container node-pools list \ --zone ${GKE_REGION} \ --cluster ${GKE_CLUSTER_NAME} Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. [Upgrade K8s](https://longhorn.io/docs/1.3.0/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-aks/) https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/aks/upgrade-k8s/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/aks/upgrade-k8s/ - Create AKS cluster with 3 nodes and install Longhorn. -Create deployment and write some data to it. -In Longhorn, set replica-replenishment-wait-interval to 0. -Upgrade AKS control plane. -AKS_RESOURCE_GROUP=&lt;aks-resource-group&gt; AKS_CLUSTER_NAME=&lt;aks-cluster-name&gt; AKS_K8S_VERSION_UPGRADE=&lt;aks-k8s-version&gt; az aks upgrade \ --resource-group ${AKS_RESOURCE_GROUP} \ --name ${AKS_CLUSTER_NAME} \ --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \ --control-plane-only Add a new node-pool. -AKS_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; AKS_DISK_SIZE=&lt;disk-size-in-gb&gt; AKS_NODE_NUM=&lt;number-of-nodes&gt; az aks nodepool add \ --resource-group ${AKS_RESOURCE_GROUP} \ --cluster-name ${AKS_CLUSTER_NAME} \ --name ${AKS_NODEPOOL_NAME_NEW} \ --node-count ${AKS_NODE_NUM} \ --node-osdisk-size ${AKS_DISK_SIZE} \ --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \ --mode System Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. + Create AKS cluster with 3 nodes and install Longhorn. Create deployment and write some data to it. In Longhorn, set replica-replenishment-wait-interval to 0. Upgrade AKS control plane. AKS_RESOURCE_GROUP=&lt;aks-resource-group&gt; AKS_CLUSTER_NAME=&lt;aks-cluster-name&gt; AKS_K8S_VERSION_UPGRADE=&lt;aks-k8s-version&gt; az aks upgrade \ --resource-group ${AKS_RESOURCE_GROUP} \ --name ${AKS_CLUSTER_NAME} \ --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \ --control-plane-only Add a new node-pool. AKS_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; AKS_DISK_SIZE=&lt;disk-size-in-gb&gt; AKS_NODE_NUM=&lt;number-of-nodes&gt; az aks nodepool add \ --resource-group ${AKS_RESOURCE_GROUP} \ --cluster-name ${AKS_CLUSTER_NAME} \ --name ${AKS_NODEPOOL_NAME_NEW} \ --node-count ${AKS_NODE_NUM} \ --node-osdisk-size ${AKS_DISK_SIZE} \ --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \ --mode System Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. [Upgrade K8s](https://longhorn.io/docs/1.3.0/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-eks/) @@ -131,126 +110,70 @@ AKS_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; AKS_DISK_SIZE=&lt;di https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/deployment/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/deployment/ - Installation Longhorn v1.1.2 and above - Support Kubernetes 1.18+ -Longhorn v1.0.0 to v1.1.1 - Support Kubernetes 1.14+. Default 1.16+ -Install using Rancher Apps &amp; MarketPlace App (Default) -Install using Helm chart from https://github.com/longhorn/longhorn/tree/master/chart -Install using YAML from https://github.com/longhorn/longhorn/blob/master/deploy/longhorn.yaml -Note: Longhorn UI can scale to multiple instances for HA purposes. -Uninstallation Make sure all the CRDs and other resources are cleaned up, following the uninstallation instruction. https://longhorn.io/docs/1.2.2/deploy/uninstall/ -Customizable Default Settings https://longhorn.io/docs/1.2.2/references/settings/ + Installation Longhorn v1.1.2 and above - Support Kubernetes 1.18+ Longhorn v1.0.0 to v1.1.1 - Support Kubernetes 1.14+. Default 1.16+ Install using Rancher Apps &amp; MarketPlace App (Default) Install using Helm chart from https://github.com/longhorn/longhorn/tree/master/chart Install using YAML from https://github.com/longhorn/longhorn/blob/master/deploy/longhorn.yaml Note: Longhorn UI can scale to multiple instances for HA purposes. Uninstallation Make sure all the CRDs and other resources are cleaned up, following the uninstallation instruction. https://longhorn.io/docs/1.2.2/deploy/uninstall/ Customizable Default Settings https://longhorn.io/docs/1.2.2/references/settings/ 2. UI https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/ui/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/ui/ - Accessibility of Longhorn UI # Test Case Test Instructions 1. Access Longhorn UI using rancher proxy 1. Create a cluster (3 worker nodes and 1 etcd/control plane) in rancher, Go to the default project. -2. Go to App, Click the launch app. -3. Select longhorn. -4. Select Rancher-Proxy under the Longhorn UI service. -5. Once the app is deployed successfully, click the /index.html link appears in App page. -6. The page should redirect to longhorn UI - https://rancher/k8s/clusters/c-aaaa/api/v1/namespaces/longhorn-system/services/http:longhorn-frontend:80/proxy/#/dashboard + Accessibility of Longhorn UI # Test Case Test Instructions 1. Access Longhorn UI using rancher proxy 1. Create a cluster (3 worker nodes and 1 etcd/control plane) in rancher, Go to the default project. 2. Go to App, Click the launch app. 3. Select longhorn. 4. Select Rancher-Proxy under the Longhorn UI service. 5. Once the app is deployed successfully, click the /index.html link appears in App page. 6. The page should redirect to longhorn UI - https://rancher/k8s/clusters/c-aaaa/api/v1/namespaces/longhorn-system/services/http:longhorn-frontend:80/proxy/#/dashboard 3. Volume https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/volume/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/volume/ - Test cases for Volume # Test Case Test Instructions Expected Results 1 Check volume Details Prerequisite: -* Longhorn Nodes has node tags -* Node Disks has disk tags -* Backup target is set to NFS server, or S3 compatible target -1. Create a workload using Longhorn volume -2. Check volume details page -3. Create volume backup * Volume Details -* State should be Attached -* Health should be healthy -* Frontend should be Block Device + Test cases for Volume # Test Case Test Instructions Expected Results 1 Check volume Details Prerequisite: * Longhorn Nodes has node tags * Node Disks has disk tags * Backup target is set to NFS server, or S3 compatible target 1. Create a workload using Longhorn volume 2. Check volume details page 3. Create volume backup * Volume Details * State should be Attached * Health should be healthy * Frontend should be Block Device 5. Kubernetes https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/kubernetes/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/kubernetes/ - Dynamic provisioning with StorageClass Can create and use volume using StorageClass -Can create a new StorageClass use new parameters and it will take effect on the volume created by the storage class. -If the PV reclaim policy is delete, once PVC and PV are deleted, Longhorn volume should be deleted. -Static provisioning using Longhorn created PV/PVC PVC can be used by the new workload -Delete the PVC will not result in PV deletion + Dynamic provisioning with StorageClass Can create and use volume using StorageClass Can create a new StorageClass use new parameters and it will take effect on the volume created by the storage class. If the PV reclaim policy is delete, once PVC and PV are deleted, Longhorn volume should be deleted. Static provisioning using Longhorn created PV/PVC PVC can be used by the new workload Delete the PVC will not result in PV deletion 6. Backup https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/backup/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/backup/ - Automation Tests # Test name Description tag 1 test_backup Test basic backup -Setup: -1. Create a volume and attach to the current node -2. Run the test for all the available backupstores. -Steps: -1. Create a backup of volume -2. Restore the backup to a new volume -3. Attach the new volume and make sure the data is the same as the old one -4. Detach the volume and delete the backup. + Automation Tests # Test name Description tag 1 test_backup Test basic backup Setup: 1. Create a volume and attach to the current node 2. Run the test for all the available backupstores. Steps: 1. Create a backup of volume 2. Restore the backup to a new volume 3. Attach the new volume and make sure the data is the same as the old one 4. Detach the volume and delete the backup. 7. Node https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/node/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/node/ - UI specific test cases # Test Case Test Instructions Expected Results 1 Storage details * Prerequisites -* Longhorn Installed -1. Verify the allocated/used storage show the right data in node details page. -2. Create a volume of 20 GB and attach to a pod and verify the storage allocated/used is shown correctly. Without any volume, allocated should be 0 and on creating new volume it should be updated as per volume present. + UI specific test cases # Test Case Test Instructions Expected Results 1 Storage details * Prerequisites * Longhorn Installed 1. Verify the allocated/used storage show the right data in node details page. 2. Create a volume of 20 GB and attach to a pod and verify the storage allocated/used is shown correctly. Without any volume, allocated should be 0 and on creating new volume it should be updated as per volume present. 8. Scheduling https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/scheduling/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/scheduling/ - Manual Test Test name Prerequisite Expectation EKS across zone scheduling Prerequisite: -* EKS Cluster with 3 nodes across two AWS zones (zone#1, zone#2) -1. Create a volume with 2 replicas, and attach it to a node. -2. Delete a replica scheduled to each zone, repeat it few times -3. Scale volume replicas = 3 -4. Scale volume replicas to 4 * Volume replicas should be scheduled one per AWS zone + Manual Test Test name Prerequisite Expectation EKS across zone scheduling Prerequisite: * EKS Cluster with 3 nodes across two AWS zones (zone#1, zone#2) 1. Create a volume with 2 replicas, and attach it to a node. 2. Delete a replica scheduled to each zone, repeat it few times 3. Scale volume replicas = 3 4. Scale volume replicas to 4 * Volume replicas should be scheduled one per AWS zone 9. Upgrade https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/upgrade/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/upgrade/ - # Test name Description 1 Higher version of Longhorn engine and lower version of volume Test Longhorn upgrade -1. Create a volume, generate and write data into the volume. -2. Keep the volume attached, then upgrade Longhorn system. -3. Write data in volume. -4. Take snapshot#1. Compute the checksum#1 -5. Write data to volume. Compute the checksum#2 -6. Take backup -7. Revert to snapshot#1 -8. Restore the backup. 2 Restore the backup taken with older engine version 1. + # Test name Description 1 Higher version of Longhorn engine and lower version of volume Test Longhorn upgrade 1. Create a volume, generate and write data into the volume. 2. Keep the volume attached, then upgrade Longhorn system. 3. Write data in volume. 4. Take snapshot#1. Compute the checksum#1 5. Write data to volume. Compute the checksum#2 6. Take backup 7. Revert to snapshot#1 8. Restore the backup. 2 Restore the backup taken with older engine version 1. Access Longhorn GUI using Rancher proxy https://longhorn.github.io/longhorn-tests/manual/rancher-integration/access-lh-gui-using-rancher-ui/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/access-lh-gui-using-rancher-ui/ - Given Downstream (RKE2/RKE1/K3s) cluster in Rancher -AND Deploy Longhorn using either of Kubectl/helm/marketplace app -When Click the Longhorn app on Rancher UI -Then Navigates to Longhorn UI -AND User should be to do all the operations available on the Longhorn GUI -AND URL should be a suffix to the Rancher URL -AND NO error in the console logs + Given Downstream (RKE2/RKE1/K3s) cluster in Rancher AND Deploy Longhorn using either of Kubectl/helm/marketplace app When Click the Longhorn app on Rancher UI Then Navigates to Longhorn UI AND User should be to do all the operations available on the Longhorn GUI AND URL should be a suffix to the Rancher URL AND NO error in the console logs Automatically Upgrading Longhorn Engine Test https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/auto-upgrade-engine/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/auto-upgrade-engine/ - Longhorn version >= 1.1.1 Reference ticket 2152 -Test basic upgrade Install old Longhorn version. E.g., &lt;= v1.0.2 Create a volume, attach it to a pod, write some data. Create a DR volume and leave it in the detached state. Upgrade to Longhorn master Set setting concurrent automatic engine upgrade per node limit to 3 Verify that volumes&rsquo; engines are upgraded automatically. Test concurrent upgrade Create a StatefulSet of scale 10 using 10 Longhorn volume. + Longhorn version >= 1.1.1 Reference ticket 2152 Test basic upgrade Install old Longhorn version. E.g., &lt;= v1.0.2 Create a volume, attach it to a pod, write some data. Create a DR volume and leave it in the detached state. Upgrade to Longhorn master Set setting concurrent automatic engine upgrade per node limit to 3 Verify that volumes&rsquo; engines are upgraded automatically. Test concurrent upgrade Create a StatefulSet of scale 10 using 10 Longhorn volume. Backing Image Error Reporting and Retry @@ -306,11 +229,7 @@ Test basic upgrade Install old Longhorn version. E.g., &lt;= v1.0.2 Create a https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/csi-sanity-check/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/csi-sanity-check/ - Related issue https://github.com/longhorn/longhorn/issues/2076 -Run csi-sanity Prepare Longhorn cluster and setup backup target. -Make csi-sanity binary from csi-test. -On one of the cluster node, run csi-sanity binary. -csi-sanity -csi.endpoint /var/lib/kubelet/obsoleted-longhorn-plugins/driver.longhorn.io/csi.sock -ginkgo.skip=&#34;should create volume from an existing source snapshot|should return appropriate values|should succeed when creating snapshot with maximum-length name|should succeed when requesting to create a snapshot with already existing name and same source volume ID|should fail when requesting to create a snapshot with already existing name and different source volume ID&#34; NOTE + Related issue https://github.com/longhorn/longhorn/issues/2076 Run csi-sanity Prepare Longhorn cluster and setup backup target. Make csi-sanity binary from csi-test. On one of the cluster node, run csi-sanity binary. csi-sanity -csi.endpoint /var/lib/kubelet/obsoleted-longhorn-plugins/driver.longhorn.io/csi.sock -ginkgo.skip=&#34;should create volume from an existing source snapshot|should return appropriate values|should succeed when creating snapshot with maximum-length name|should succeed when requesting to create a snapshot with already existing name and same source volume ID|should fail when requesting to create a snapshot with already existing name and different source volume ID&#34; NOTE Degraded availability with added nodes @@ -331,37 +250,28 @@ csi-sanity -csi.endpoint /var/lib/kubelet/obsoleted-longhorn-plugins/driver.long https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/dr-volume-latest-backup-deletion/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/dr-volume-latest-backup-deletion/ - DR volume keeps getting the latest update from the related backups. Edge cases where the latest backup is deleted can be test as below. -Case 1: Create a volume and take multiple backups for the same. Delete the latest backup. Create another cluster and set the same backup store to access the backups created in step 1. Go to backup page and click on the backup. Verify the Create Disaster Recovery option is enabled for it. + DR volume keeps getting the latest update from the related backups. Edge cases where the latest backup is deleted can be test as below. Case 1: Create a volume and take multiple backups for the same. Delete the latest backup. Create another cluster and set the same backup store to access the backups created in step 1. Go to backup page and click on the backup. Verify the Create Disaster Recovery option is enabled for it. Drain using Rancher UI https://longhorn.github.io/longhorn-tests/manual/rancher-integration/drain-using-rancher-ui/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/drain-using-rancher-ui/ - Note: Enabling Delete Empty Dir Data is mandatory to drain a node if a pod is associated with any storage. -Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Drain operation on single node using Rancher UI Given Single node (1 Worker) cluster with Longhorn installed -AND few RWO and RWX volumes attached with node/pod exists -AND 1 RWO and 1 RWX volumes unattached -When Drain the node with default values of Rancher UI + Note: Enabling Delete Empty Dir Data is mandatory to drain a node if a pod is associated with any storage. Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Drain operation on single node using Rancher UI Given Single node (1 Worker) cluster with Longhorn installed AND few RWO and RWX volumes attached with node/pod exists AND 1 RWO and 1 RWX volumes unattached When Drain the node with default values of Rancher UI Extended CSI snapshot support to Longhorn snapshot https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/extend_csi_snapshot_support/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/extend_csi_snapshot_support/ - Related issue https://github.com/longhorn/longhorn/issues/2534 -Test Setup Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/ Deploy 4 VolumeSnapshotClass: kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-backup-1 driver: driver.longhorn.io deletionPolicy: Delete kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-backup-2 driver: driver.longhorn.io deletionPolicy: Delete parameters: type: bak kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-snapshot driver: driver.longhorn.io deletionPolicy: Delete parameters: type: snap kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: invalid-class driver: driver.longhorn.io deletionPolicy: Delete parameters: type: invalid Create Longhorn volume test-vol of 5GB. + Related issue https://github.com/longhorn/longhorn/issues/2534 Test Setup Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/ Deploy 4 VolumeSnapshotClass: kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-backup-1 driver: driver.longhorn.io deletionPolicy: Delete kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-backup-2 driver: driver.longhorn.io deletionPolicy: Delete parameters: type: bak kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-snapshot driver: driver.longhorn.io deletionPolicy: Delete parameters: type: snap kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: invalid-class driver: driver.longhorn.io deletionPolicy: Delete parameters: type: invalid Create Longhorn volume test-vol of 5GB. HA Volume Migration https://longhorn.github.io/longhorn-tests/manual/pre-release/ha/ha-volume-migration/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/ha/ha-volume-migration/ - Create a migratable volume: -Deploy a migratable StorageClass. e.g., https://github.com/longhorn/longhorn/blob/master/examples/rwx/storageclass-migratable.yaml Create a PVC with access mode ReadWriteMany via this StorageClass. Attach a volume to a node and wait for volume running. Then write some data into the volume. Here I would recommend directly restoring a volume (set fromBackup in the StorageClass) and attach it instead. -Start the migration by request attaching to another node for the volume. -Trigger the following scenarios then confirm or rollback the migration: + Create a migratable volume: Deploy a migratable StorageClass. e.g., https://github.com/longhorn/longhorn/blob/master/examples/rwx/storageclass-migratable.yaml Create a PVC with access mode ReadWriteMany via this StorageClass. Attach a volume to a node and wait for volume running. Then write some data into the volume. Here I would recommend directly restoring a volume (set fromBackup in the StorageClass) and attach it instead. Start the migration by request attaching to another node for the volume. Trigger the following scenarios then confirm or rollback the migration: Improve Node Failure Handling By Automatically Force Delete Terminating Pods of StatefulSet/Deployment On Downed Node @@ -375,22 +285,14 @@ Trigger the following scenarios then confirm or rollback the migration:https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/kubernetes-upgrade-test/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/kubernetes-upgrade-test/ - We also need to cover the Kubernetes upgrade process for supported Kubernetes version, make sure pod and volumes works after a major version upgrade. -Related Issue https://github.com/longhorn/longhorn/issues/2566 -Test with K8s upgrade Create a K8s (Immediate prior version) cluster with 3 worker nodes and 1 control plane. Deploy Longhorn version (Immediate prior version) on the cluster. Create a volume and attach to a pod. Write data to the volume and compute the checksum. + We also need to cover the Kubernetes upgrade process for supported Kubernetes version, make sure pod and volumes works after a major version upgrade. Related Issue https://github.com/longhorn/longhorn/issues/2566 Test with K8s upgrade Create a K8s (Immediate prior version) cluster with 3 worker nodes and 1 control plane. Deploy Longhorn version (Immediate prior version) on the cluster. Create a volume and attach to a pod. Write data to the volume and compute the checksum. Longhorn in an hardened cluster https://longhorn.github.io/longhorn-tests/manual/rancher-integration/lh-hardend-rancher/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/lh-hardend-rancher/ - Given Hardened Downstream (RKE2/RKE1/K3s) cluster in Rancher v2.6.x with CIS 1.6 as per Hardening guide -When Deploy Longhorn using Marketplace app -Then Longhorn should be deployed properly -AND Volume creation and other operations should work fine -Given Hardened Downstream (RKE2/RKE1/K3s) cluster in Rancher v2.7.x with CIS 1.6 or 1.20 or 1.23 as per Hardending guide for Rancher 2.7 -When Deploy Longhorn using Marketplace app -Then Longhorn should be deployed properly + Given Hardened Downstream (RKE2/RKE1/K3s) cluster in Rancher v2.6.x with CIS 1.6 as per Hardening guide When Deploy Longhorn using Marketplace app Then Longhorn should be deployed properly AND Volume creation and other operations should work fine Given Hardened Downstream (RKE2/RKE1/K3s) cluster in Rancher v2.7.x with CIS 1.6 or 1.20 or 1.23 as per Hardending guide for Rancher 2.7 When Deploy Longhorn using Marketplace app Then Longhorn should be deployed properly Longhorn installation multiple times @@ -404,46 +306,28 @@ Then Longhorn should be deployed properly https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/longhorn-upgrade-test/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/longhorn-upgrade-test/ - Setup 2 attached volumes with data. 2 detached volumes with data. 2 new volumes without data. 2 deployments of one pod. 1 statefulset of 10 pods. Auto Salvage set to disable. Test After upgrade: -Make sure the existing instance managers didn&rsquo;t restart. Make sure pods didn&rsquo;t restart. Check the contents of the volumes. If the Engine API version is incompatible, manager cannot do anything about the attached volumes except detaching it. + Setup 2 attached volumes with data. 2 detached volumes with data. 2 new volumes without data. 2 deployments of one pod. 1 statefulset of 10 pods. Auto Salvage set to disable. Test After upgrade: Make sure the existing instance managers didn&rsquo;t restart. Make sure pods didn&rsquo;t restart. Check the contents of the volumes. If the Engine API version is incompatible, manager cannot do anything about the attached volumes except detaching it. Longhorn using fleet on multiple downstream clusters https://longhorn.github.io/longhorn-tests/manual/rancher-integration/fleet-deploy/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/fleet-deploy/ - reference: https://github.com/rancher/fleet -Test Longhorn deployment using fleet: Given Downstream multiple (RKE2/RKE1/K3s) clusters in Rancher -When Use fleet to deploy Longhorn -Then Longhorn should be deployed to all the cluster -AND Longhorn UI should be accessible using Rancher proxy -Test Longhorn uninstall using fleet: Given Downstream multiple (RKE2/RKE1/K3s) clusters in Rancher -AND Longhorn is deployed on all the clusters using fleet -When Use fleet to uninstall Longhorn -Then Longhorn should be uninstalled from all the cluster + reference: https://github.com/rancher/fleet Test Longhorn deployment using fleet: Given Downstream multiple (RKE2/RKE1/K3s) clusters in Rancher When Use fleet to deploy Longhorn Then Longhorn should be deployed to all the cluster AND Longhorn UI should be accessible using Rancher proxy Test Longhorn uninstall using fleet: Given Downstream multiple (RKE2/RKE1/K3s) clusters in Rancher AND Longhorn is deployed on all the clusters using fleet When Use fleet to uninstall Longhorn Then Longhorn should be uninstalled from all the cluster Longhorn with engine is not deployed on all the nodes https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/partial-engine-deployment/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/partial-engine-deployment/ - Related Issue https://github.com/longhorn/longhorn/issues/2081 -Scenarios: Case 1: Test volume operations when some of the engine image DaemonSet pods are miss scheduled Install Longhorn in a 3-node cluster: node-1, node-2, node-3 Create a volume, vol-1, of 3 replicas Create another volume, vol-2, of 3 replicas Taint node-1 with the taint: key=value:NoSchedule Check that all functions (attach, detach, snapshot, backup, expand, restore, creating DR volume, &hellip; ) are working ok for vol-1 Case 2: Test volume operations when some of engine image DaemonSet pods are not fully deployed Continue from case 1 Attach vol-1 to node-1. + Related Issue https://github.com/longhorn/longhorn/issues/2081 Scenarios: Case 1: Test volume operations when some of the engine image DaemonSet pods are miss scheduled Install Longhorn in a 3-node cluster: node-1, node-2, node-3 Create a volume, vol-1, of 3 replicas Create another volume, vol-2, of 3 replicas Taint node-1 with the taint: key=value:NoSchedule Check that all functions (attach, detach, snapshot, backup, expand, restore, creating DR volume, &hellip; ) are working ok for vol-1 Case 2: Test volume operations when some of engine image DaemonSet pods are not fully deployed Continue from case 1 Attach vol-1 to node-1. Monitoring https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/monitoring/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/monitoring/ - Prometheus Support test cases Install the Prometheus Operator (include a role and service account for it). For example:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-operator namespace: default roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-operator subjects: -- kind: ServiceAccount name: prometheus-operator namespace: default -&ndash; apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-operator namespace: default rules: -- apiGroups: -- extensions resources: -- thirdpartyresources verbs: [&quot;&quot;] -- apiGroups: -- apiextensions.k8s.io resources: -- customresourcedefinitions verbs: [&quot;&quot;] + Prometheus Support test cases Install the Prometheus Operator (include a role and service account for it). For example:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-operator namespace: default roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-operator subjects: - kind: ServiceAccount name: prometheus-operator namespace: default &ndash; apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-operator namespace: default rules: - apiGroups: - extensions resources: - thirdpartyresources verbs: [&quot;&quot;] - apiGroups: - apiextensions.k8s.io resources: - customresourcedefinitions verbs: [&quot;&quot;] New Node with Custom Data Directory @@ -464,30 +348,21 @@ Scenarios: Case 1: Test volume operations when some of the engine image DaemonSe https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-disconnection/node-disconnection/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-disconnection/node-disconnection/ - https://github.com/longhorn/longhorn/issues/1545 For disconnect node : https://github.com/longhorn/longhorn/files/4864127/network_down.sh.zip -If auto-salvage is disabled, the auto-reattachment behavior after the node disconnection depends on all replicas are in ERROR state or not. -(1) If all replicas are in ERROR state, the volume would remain in detached/faulted state if auto-salvage is disabled. -(2) If there is any healthy replica, the volume would be auto-reattached even though auto-salvage is disabled. -What makes all replicas in ERROR state? When there is data writing during the disconnection: + https://github.com/longhorn/longhorn/issues/1545 For disconnect node : https://github.com/longhorn/longhorn/files/4864127/network_down.sh.zip If auto-salvage is disabled, the auto-reattachment behavior after the node disconnection depends on all replicas are in ERROR state or not. (1) If all replicas are in ERROR state, the volume would remain in detached/faulted state if auto-salvage is disabled. (2) If there is any healthy replica, the volume would be auto-reattached even though auto-salvage is disabled. What makes all replicas in ERROR state? When there is data writing during the disconnection: Node drain and deletion test https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/node-drain-deletion/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/node-drain-deletion/ - Drain with force Make sure the volumes on the drained/removed node can be detached or recovered correctly. The related issue: https://github.com/longhorn/longhorn/issues/1214 -Deploy a cluster contains 3 worker nodes N1, N2, N3. Deploy Longhorn. Create a 1-replica deployment with a 3-replica Longhorn volume. The volume is attached to N1. Write some data to the volume and get the md5sum. Force drain and remove N2, which contains one replica only. kubectl drain &lt;Node name&gt; --delete-emptydir-data=true --force=true --grace-period=-1 --ignore-daemonsets=true --timeout=&lt;Desired timeout in secs&gt; Wait for the volume Degraded. + Drain with force Make sure the volumes on the drained/removed node can be detached or recovered correctly. The related issue: https://github.com/longhorn/longhorn/issues/1214 Deploy a cluster contains 3 worker nodes N1, N2, N3. Deploy Longhorn. Create a 1-replica deployment with a 3-replica Longhorn volume. The volume is attached to N1. Write some data to the volume and get the md5sum. Force drain and remove N2, which contains one replica only. kubectl drain &lt;Node name&gt; --delete-emptydir-data=true --force=true --grace-period=-1 --ignore-daemonsets=true --timeout=&lt;Desired timeout in secs&gt; Wait for the volume Degraded. Physical node down https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/physical-node-down/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/physical-node-down/ - One physical node down should result in the state of that node change to Down. -When using with CSI driver, one node with controller (StatefulSet/Deployment) and pod down should result in Kubernetes migrate the pod to another node, and Longhorn volume should be able to be used on that node as well. Test scenarios for this are documented here. -Note: -In this case, RWX should be excluded. -Ref: https://github.com/longhorn/longhorn/issues/5900#issuecomment-1541360552 + One physical node down should result in the state of that node change to Down. When using with CSI driver, one node with controller (StatefulSet/Deployment) and pod down should result in Kubernetes migrate the pod to another node, and Longhorn volume should be able to be used on that node as well. Test scenarios for this are documented here. Note: In this case, RWX should be excluded. Ref: https://github.com/longhorn/longhorn/issues/5900#issuecomment-1541360552 Physical node reboot @@ -501,24 +376,21 @@ Ref: https://github.com/longhorn/longhorn/issues/5900#issuecomment-1541360552https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/priorityclass-default-setting/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/priorityclass-default-setting/ - There are three different cases we need to test when the user inputs a default setting for Priority Class: -Install Longhorn with no priority-class set in the default settings. The Priority Class setting should be empty after the installation completes according to the longhorn-ui, and the default Priority of all Pods in the longhorn-system namespace should be 0: ~ kubectl -n longhorn-system describe pods | grep Priority # should be repeated many times Priority: 0 Install Longhorn with a nonexistent priority-class in the default settings. + There are three different cases we need to test when the user inputs a default setting for Priority Class: Install Longhorn with no priority-class set in the default settings. The Priority Class setting should be empty after the installation completes according to the longhorn-ui, and the default Priority of all Pods in the longhorn-system namespace should be 0: ~ kubectl -n longhorn-system describe pods | grep Priority # should be repeated many times Priority: 0 Install Longhorn with a nonexistent priority-class in the default settings. Prometheus Support https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/prometheus_support/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/prometheus_support/ - Prometheus Support allows user to monitor the longhorn metrics. The details are available at https://longhorn.io/docs/1.1.0/monitoring/ -Monitor longhorn Deploy the Prometheus-operator, ServiceMonitor pointing to longhorn-backend and Prometheus as mentioned in the doc. Create an ingress pointing to Prometheus service. Access the Prometheus web UI using the ingress created in the step 2. Select the metrics from below to monitor the longhorn resources. longhorn_volume_actual_size_bytes longhorn_volume_capacity_bytes longhorn_volume_robustness longhorn_volume_state longhorn_instance_manager_cpu_requests_millicpu longhorn_instance_manager_cpu_usage_millicpu longhorn_instance_manager_memory_requests_bytes longhorn_instance_manager_memory_usage_bytes longhorn_manager_cpu_usage_millicpu longhorn_manager_memory_usage_bytes longhorn_node_count_total longhorn_node_status longhorn_node_cpu_capacity_millicpu longhorn_node_cpu_usage_millicpu longhorn_node_memory_capacity_bytes longhorn_node_memory_usage_bytes longhorn_node_storage_capacity_bytes longhorn_node_storage_reservation_bytes longhorn_node_storage_usage_bytes longhorn_disk_capacity_bytes longhorn_disk_reservation_bytes longhorn_disk_usage_bytes Deploy workloads which use Longhorn volumes into the cluster. + Prometheus Support allows user to monitor the longhorn metrics. The details are available at https://longhorn.io/docs/1.1.0/monitoring/ Monitor longhorn Deploy the Prometheus-operator, ServiceMonitor pointing to longhorn-backend and Prometheus as mentioned in the doc. Create an ingress pointing to Prometheus service. Access the Prometheus web UI using the ingress created in the step 2. Select the metrics from below to monitor the longhorn resources. longhorn_volume_actual_size_bytes longhorn_volume_capacity_bytes longhorn_volume_robustness longhorn_volume_state longhorn_instance_manager_cpu_requests_millicpu longhorn_instance_manager_cpu_usage_millicpu longhorn_instance_manager_memory_requests_bytes longhorn_instance_manager_memory_usage_bytes longhorn_manager_cpu_usage_millicpu longhorn_manager_memory_usage_bytes longhorn_node_count_total longhorn_node_status longhorn_node_cpu_capacity_millicpu longhorn_node_cpu_usage_millicpu longhorn_node_memory_capacity_bytes longhorn_node_memory_usage_bytes longhorn_node_storage_capacity_bytes longhorn_node_storage_reservation_bytes longhorn_node_storage_usage_bytes longhorn_disk_capacity_bytes longhorn_disk_reservation_bytes longhorn_disk_usage_bytes Deploy workloads which use Longhorn volumes into the cluster. PVC provisioning with insufficient storage https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/pvc_provisioning_with_insufficient_storage/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/pvc_provisioning_with_insufficient_storage/ - Related Issue: https://github.com/longhorn/longhorn/issues/4654 https://github.com/longhorn/longhorn/issues/3529 Root Cause Analysis https://github.com/longhorn/longhorn/issues/4654#issuecomment-1264870672 This case need to be tested on both RWO/RWX volumes -Create a PVC with size larger than 8589934591 GiB. Deployment keep in pending status, RWO/RWX volume will keep in a create -&gt; delete loop. Create a PVC with size &lt;= 8589934591 GiB, but greater than the actual available space size. RWO/RWX volume will be created, and volume will have annotation &ldquo;longhorn.io/volume-scheduling-error&rdquo;: &ldquo;insufficient storage volume scheduling failure&rdquo; in it. + Related Issue: https://github.com/longhorn/longhorn/issues/4654 https://github.com/longhorn/longhorn/issues/3529 Root Cause Analysis https://github.com/longhorn/longhorn/issues/4654#issuecomment-1264870672 This case need to be tested on both RWO/RWX volumes Create a PVC with size larger than 8589934591 GiB. Deployment keep in pending status, RWO/RWX volume will keep in a create -&gt; delete loop. Create a PVC with size &lt;= 8589934591 GiB, but greater than the actual available space size. RWO/RWX volume will be created, and volume will have annotation &ldquo;longhorn.io/volume-scheduling-error&rdquo;: &ldquo;insufficient storage volume scheduling failure&rdquo; in it. Re-deploy CSI components when their images change @@ -532,8 +404,7 @@ Create a PVC with size larger than 8589934591 GiB. Deployment keep in pending st https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/recurring-backup-job-interruptions/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/recurring-backup-job-interruptions/ - Related Issue https://github.com/longhorn/longhorn/issues/1882 -Scenario 1- Allow Recurring Job While Volume Is Detached disabled, attached pod scaled down while the recurring backup was in progress. Create a volume, attach to a pod of a statefulSet, and write 800 Mi data into it. Set a recurring job. While the recurring job is in progress, scale down the pod to 0 of the statefulSet. Volume first detached and cron job gets finished saying unable to complete the backup. + Related Issue https://github.com/longhorn/longhorn/issues/1882 Scenario 1- Allow Recurring Job While Volume Is Detached disabled, attached pod scaled down while the recurring backup was in progress. Create a volume, attach to a pod of a statefulSet, and write 800 Mi data into it. Set a recurring job. While the recurring job is in progress, scale down the pod to 0 of the statefulSet. Volume first detached and cron job gets finished saying unable to complete the backup. Replica Rebuilding @@ -554,16 +425,14 @@ Scenario 1- Allow Recurring Job While Volume Is Detached disabled, attached pod https://longhorn.github.io/longhorn-tests/manual/pre-release/cluster-restore/restore-to-an-old-cluster/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/cluster-restore/restore-to-an-old-cluster/ - Notice that the behaviors will be different if the cluster node roles are different. e.g., A cluster contains 1 dedicated master node + 3 worker node is different from a cluster contains 3 nodes which are both master and worker. This test may need to be validated for both kind of cluster. -Node creation and deletion Deploy a 3-worker-node cluster then install Longhorn system. Deploy some workloads using Longhorn volumes then write some data. + Notice that the behaviors will be different if the cluster node roles are different. e.g., A cluster contains 1 dedicated master node + 3 worker node is different from a cluster contains 3 nodes which are both master and worker. This test may need to be validated for both kind of cluster. Node creation and deletion Deploy a 3-worker-node cluster then install Longhorn system. Deploy some workloads using Longhorn volumes then write some data. Return an error when fail to remount a volume https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/error-fail-remount/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/error-fail-remount/ - Case 1: Volume with a corrupted filesystem try to remount Steps to reproduce bug: -Create a volume of size 1GB, say terminate-immediatly volume. Create PV/PVC from the volume terminate-immediatly Create a deployment of 1 pod with image ubuntu:xenial and the PVC terminate-immediatly in default namespace Find the node on which the pod is scheduled to. Let&rsquo;s say the node is Node-1 ssh into Node-1 destroy the filesystem of terminate-immediatly by running command dd if=/dev/zero of=/dev/longhorn/terminate-immediatly Find and kill the engine instance manager in Node-X. + Case 1: Volume with a corrupted filesystem try to remount Steps to reproduce bug: Create a volume of size 1GB, say terminate-immediatly volume. Create PV/PVC from the volume terminate-immediatly Create a deployment of 1 pod with image ubuntu:xenial and the PVC terminate-immediatly in default namespace Find the node on which the pod is scheduled to. Let&rsquo;s say the node is Node-1 ssh into Node-1 destroy the filesystem of terminate-immediatly by running command dd if=/dev/zero of=/dev/longhorn/terminate-immediatly Find and kill the engine instance manager in Node-X. Reusing failed replica for rebuilding @@ -577,83 +446,56 @@ Create a volume of size 1GB, say terminate-immediatly volume. Create PV/PVC from https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/tolerations_priorityclass_setting/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/tolerations_priorityclass_setting/ - Related issue https://github.com/longhorn/longhorn/issues/2120 -Manual Tests: -Case 1: Existing Longhorn installation Install Longhorn master. Change toleration in UI setting Verify that longhorn.io/last-applied-tolerations annotation and toleration of manager, drive deployer, UI are not changed. Verify that longhorn.io/last-applied-tolerations annotation and toleration for managed components (CSI components, IM pods, share manager pod, EI daemonset, backing-image-manager, cronjob) are updated correctly Case 2: New installation by Helm Install Longhorn master, set tolerations like: defaultSettings: taintToleration: &#34;key=value:NoSchedule&#34; longhornManager: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule longhornDriver: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule longhornUI: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule Verify that the toleration is added for: IM pods, Share Manager pods, CSI deployments, CSI daemonset, the backup jobs, manager, drive deployer, UI Uninstall the Helm release. + Related issue https://github.com/longhorn/longhorn/issues/2120 Manual Tests: Case 1: Existing Longhorn installation Install Longhorn master. Change toleration in UI setting Verify that longhorn.io/last-applied-tolerations annotation and toleration of manager, drive deployer, UI are not changed. Verify that longhorn.io/last-applied-tolerations annotation and toleration for managed components (CSI components, IM pods, share manager pod, EI daemonset, backing-image-manager, cronjob) are updated correctly Case 2: New installation by Helm Install Longhorn master, set tolerations like: defaultSettings: taintToleration: &#34;key=value:NoSchedule&#34; longhornManager: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule longhornDriver: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule longhornUI: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule Verify that the toleration is added for: IM pods, Share Manager pods, CSI deployments, CSI daemonset, the backup jobs, manager, drive deployer, UI Uninstall the Helm release. Setup and test storage network https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-storage-network/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-storage-network/ - Related issue https://github.com/longhorn/longhorn/issues/2285 -Test storage network Create AWS instances Given Create VPC. -VPC only IPv4 CIDR 10.0.0.0/16 And Create an internet gateway. -Attach to VPC And Add the internet gateway to the VPC Main route table, Routes. -Destination 0.0.0.0/0 And Create 2 subnets in the VPC. -Subnet-1: 10.0.1.0/24 Subnet-2: 10.0.2.0/24 And Launch 3 EC2 instances. -Use the created VPC Use subnet-1 for network interface 1 Use subnet-2 for network interface 2 Disable Auto-assign public IP Add security group inbound rule to allow All traffic from Anywhere-IPv4 Stop Source/destination check And Create 3 elastic IPs. + Related issue https://github.com/longhorn/longhorn/issues/2285 Test storage network Create AWS instances Given Create VPC. VPC only IPv4 CIDR 10.0.0.0/16 And Create an internet gateway. Attach to VPC And Add the internet gateway to the VPC Main route table, Routes. Destination 0.0.0.0/0 And Create 2 subnets in the VPC. Subnet-1: 10.0.1.0/24 Subnet-2: 10.0.2.0/24 And Launch 3 EC2 instances. Use the created VPC Use subnet-1 for network interface 1 Use subnet-2 for network interface 2 Disable Auto-assign public IP Add security group inbound rule to allow All traffic from Anywhere-IPv4 Stop Source/destination check And Create 3 elastic IPs. Setup and test storage network when Multus version is above v4.0.0 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-storage-network/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-storage-network/ - Related issue https://github.com/longhorn/longhorn/issues/6953 -Test storage network Create AWS instances Given Create VPC. -VPC only IPv4 CIDR 10.0.0.0/16 And Create an internet gateway. -Attach to VPC And Add the internet gateway to the VPC Main route table, Routes. -Destination 0.0.0.0/0 And Create 2 subnets in the VPC. -Subnet-1: 10.0.1.0/24 Subnet-2: 10.0.2.0/24 And Launch 3 EC2 instances. -Use the created VPC Use subnet-1 for network interface 1 Use subnet-2 for network interface 2 Disable Auto-assign public IP Add security group inbound rule to allow All traffic from Anywhere-IPv4 Stop Source/destination check And Create 3 elastic IPs. + Related issue https://github.com/longhorn/longhorn/issues/6953 Test storage network Create AWS instances Given Create VPC. VPC only IPv4 CIDR 10.0.0.0/16 And Create an internet gateway. Attach to VPC And Add the internet gateway to the VPC Main route table, Routes. Destination 0.0.0.0/0 And Create 2 subnets in the VPC. Subnet-1: 10.0.1.0/24 Subnet-2: 10.0.2.0/24 And Launch 3 EC2 instances. Use the created VPC Use subnet-1 for network interface 1 Use subnet-2 for network interface 2 Disable Auto-assign public IP Add security group inbound rule to allow All traffic from Anywhere-IPv4 Stop Source/destination check And Create 3 elastic IPs. Single replica node down https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/single-replica-node-down/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/single-replica-node-down/ - Related Issues https://github.com/longhorn/longhorn/issues/2329 https://github.com/longhorn/longhorn/issues/2309 https://github.com/longhorn/longhorn/issues/3957 -Default Setting Automatic salvage is enabled. -Node restart/down scenario with Pod Deletion Policy When Node is Down set to default value do-nothing. Create RWO|RWX volume with replica count = 1 &amp; data locality = enabled|disabled|strict-local. For data locality = strict-local, use RWO volume to do test. Create deployment|statefulset for volume. Power down node of volume/replica. The workload pod will get stuck in the terminating state. Volume will fail to attach since volume is not ready (i. + Related Issues https://github.com/longhorn/longhorn/issues/2329 https://github.com/longhorn/longhorn/issues/2309 https://github.com/longhorn/longhorn/issues/3957 Default Setting Automatic salvage is enabled. Node restart/down scenario with Pod Deletion Policy When Node is Down set to default value do-nothing. Create RWO|RWX volume with replica count = 1 &amp; data locality = enabled|disabled|strict-local. For data locality = strict-local, use RWO volume to do test. Create deployment|statefulset for volume. Power down node of volume/replica. The workload pod will get stuck in the terminating state. Volume will fail to attach since volume is not ready (i. Snapshot while writing data in the volume https://longhorn.github.io/longhorn-tests/manual/pre-release/basic-operations/snapshot-while-writing-data/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/basic-operations/snapshot-while-writing-data/ - Related issue: https://github.com/longhorn/longhorn/issues/2187 -Scenario Create a kubernetes pod + pvc that mounts a Longhorn volume. Write 5 Gib into the pod using dd if=/dev/urandom of=/mnt/&lt;volume&gt; count=5000 bs=1M conv=fsync status=progress While running the above command initiate a snapshot. Verify the logs of the instance-manager using kubetail instance-manager -n longhorn-system. There should some logs related to freezing and unfreezing the filesystem. Like Froze filesystem of volume mounted ... Verify snapshot succeeded and dd operation will complete. + Related issue: https://github.com/longhorn/longhorn/issues/2187 Scenario Create a kubernetes pod + pvc that mounts a Longhorn volume. Write 5 Gib into the pod using dd if=/dev/urandom of=/mnt/&lt;volume&gt; count=5000 bs=1M conv=fsync status=progress While running the above command initiate a snapshot. Verify the logs of the instance-manager using kubetail instance-manager -n longhorn-system. There should some logs related to freezing and unfreezing the filesystem. Like Froze filesystem of volume mounted ... Verify snapshot succeeded and dd operation will complete. Storage Network Test https://longhorn.github.io/longhorn-tests/manual/pre-release/basic-operations/storage-network/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/basic-operations/storage-network/ - Related issue: https://github.com/longhorn/longhorn/issues/2285 -Test Multus version below v4.0.0 Given Set up the Longhorn environment as mentioned here -When Run Longhorn core tests on the environment. -Then All the tests should pass. -Related issue: https://github.com/longhorn/longhorn/issues/6953 -Test Multus version above v4.0.0 Given Set up the Longhorn environment as mentioned here -When Run Longhorn core tests on the environment. -Then All the tests should pass. + Related issue: https://github.com/longhorn/longhorn/issues/2285 Test Multus version below v4.0.0 Given Set up the Longhorn environment as mentioned here When Run Longhorn core tests on the environment. Then All the tests should pass. Related issue: https://github.com/longhorn/longhorn/issues/6953 Test Multus version above v4.0.0 Given Set up the Longhorn environment as mentioned here When Run Longhorn core tests on the environment. Then All the tests should pass. Support Kubelet Volume Metrics https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/kubelet_volume_metrics/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/kubelet_volume_metrics/ - Intro Kubelet exposes kubelet_volume_stats_* metrics. Those metrics measure PVC&rsquo;s filesystem related information inside a Longhorn block device. -Test steps: Create a cluster and set up this monitoring system: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack Install Longhorn. Deploy some workloads using Longhorn volumes. Make sure there are some workloads using Longhorn PVCs in volumeMode: Block and some workloads using Longhorn PVCs in volumeMode: Filesystem. See https://longhorn.io/docs/1.0.2/references/examples/ for examples. Create ingress to Prometheus server and Grafana. Navigate to Prometheus server, verify that all Longhorn PVCs in volumeMode: Filesystem show up in metrics: kubelet_volume_stats_capacity_bytes kubelet_volume_stats_available_bytes kubelet_volume_stats_used_bytes kubelet_volume_stats_inodes kubelet_volume_stats_inodes_free kubelet_volume_stats_inodes_used. + Intro Kubelet exposes kubelet_volume_stats_* metrics. Those metrics measure PVC&rsquo;s filesystem related information inside a Longhorn block device. Test steps: Create a cluster and set up this monitoring system: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack Install Longhorn. Deploy some workloads using Longhorn volumes. Make sure there are some workloads using Longhorn PVCs in volumeMode: Block and some workloads using Longhorn PVCs in volumeMode: Filesystem. See https://longhorn.io/docs/1.0.2/references/examples/ for examples. Create ingress to Prometheus server and Grafana. Navigate to Prometheus server, verify that all Longhorn PVCs in volumeMode: Filesystem show up in metrics: kubelet_volume_stats_capacity_bytes kubelet_volume_stats_available_bytes kubelet_volume_stats_used_bytes kubelet_volume_stats_inodes kubelet_volume_stats_inodes_free kubelet_volume_stats_inodes_used. Test `Rebuild` in volume.meta blocks engine start https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-rebuild-in-meta-blocks-engine-start/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-rebuild-in-meta-blocks-engine-start/ - Related issue https://github.com/longhorn/longhorn/issues/6626 -Test with patched image Given a patched longhorn-engine image with the following code change. -diff --git a/pkg/sync/sync.go b/pkg/sync/sync.go index b48ddd46..c4523f11 100644 --- a/pkg/sync/sync.go +++ b/pkg/sync/sync.go @@ -534,9 +534,9 @@ func (t *Task) reloadAndVerify(address, instanceName string, repClient *replicaC return err } - if err := repClient.SetRebuilding(false); err != nil { - return err - } + // if err := repClient.SetRebuilding(false); err != nil { + // return err + // } return nil } And a patched longhorn-instance-manager image with the longhorn-engine vendor updated. + Related issue https://github.com/longhorn/longhorn/issues/6626 Test with patched image Given a patched longhorn-engine image with the following code change. diff --git a/pkg/sync/sync.go b/pkg/sync/sync.go index b48ddd46..c4523f11 100644 --- a/pkg/sync/sync.go +++ b/pkg/sync/sync.go @@ -534,9 +534,9 @@ func (t *Task) reloadAndVerify(address, instanceName string, repClient *replicaC return err } - if err := repClient.SetRebuilding(false); err != nil { - return err - } + // if err := repClient.SetRebuilding(false); err != nil { + // return err + // } return nil } And a patched longhorn-instance-manager image with the longhorn-engine vendor updated. Test access style for S3 compatible backupstore @@ -667,9 +509,7 @@ diff --git a/pkg/sync/sync.go b/pkg/sync/sync.go index b48ddd46..c4523f11 100644 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/additional-printer-columns/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/additional-printer-columns/ - For each of the case below: -Fresh installation of Longhorn. (make sure to delete all Longhorn CRDs before installation) Upgrade from older version. Run: -kubectl get &lt;LONGHORN-CRD&gt; -n longhorn-system Verify that the output contains information as specify in the additionalPrinerColumns at here + For each of the case below: Fresh installation of Longhorn. (make sure to delete all Longhorn CRDs before installation) Upgrade from older version. Run: kubectl get &lt;LONGHORN-CRD&gt; -n longhorn-system Verify that the output contains information as specify in the additionalPrinerColumns at here Test backing image @@ -704,18 +544,14 @@ kubectl get &lt;LONGHORN-CRD&gt; -n longhorn-system Verify that the outp https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.3/test-backing-image-space-usage/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.3/test-backing-image-space-usage/ - Prerequisite A sparse file should be prepared before test. e.g.: -~ touch empty-filesystem.raw ~ truncate -s 500M empty-filesystem.raw ~ mkfs.ext4 empty-filesystem.raw mke2fs 1.46.1 (9-Feb-2021) Creating filesystem with 512000 1k blocks and 128016 inodes Filesystem UUID: fe6cfb58-134a-42b3-afab-59474d9515e0 Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409 Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done ~ shasum -a 512 empty-filesystem. + Prerequisite A sparse file should be prepared before test. e.g.: ~ touch empty-filesystem.raw ~ truncate -s 500M empty-filesystem.raw ~ mkfs.ext4 empty-filesystem.raw mke2fs 1.46.1 (9-Feb-2021) Creating filesystem with 512000 1k blocks and 128016 inodes Filesystem UUID: fe6cfb58-134a-42b3-afab-59474d9515e0 Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409 Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done ~ shasum -a 512 empty-filesystem. Test Backup Creation With Old Engine Image https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/backup-creation-with-old-engine-image/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/backup-creation-with-old-engine-image/ - Related issue https://github.com/longhorn/longhorn/issues/2897 -Test Step Given with Longhorn v1.2.0-rc2 or above. And deploy engine image oldEI older than v1.2.0 (for example: longhornio/longhorn-engine:v1.1.2). And create volume vol-old-engine. And attach volume vol-old-engine to one of a node. And upgrade volume vol-old-engine to engine image oldEI. -When create backup of volume vol-old-engine. -Then watch kubectl kubectl get backups.longhorn.io -l backup-volume=vol-old-engine -w. And should see two backups temporarily (in transition state). And should see only one backup be left after a while. + Related issue https://github.com/longhorn/longhorn/issues/2897 Test Step Given with Longhorn v1.2.0-rc2 or above. And deploy engine image oldEI older than v1.2.0 (for example: longhornio/longhorn-engine:v1.1.2). And create volume vol-old-engine. And attach volume vol-old-engine to one of a node. And upgrade volume vol-old-engine to engine image oldEI. When create backup of volume vol-old-engine. Then watch kubectl kubectl get backups.longhorn.io -l backup-volume=vol-old-engine -w. And should see two backups temporarily (in transition state). And should see only one backup be left after a while. Test backup listing S3/NFS @@ -729,50 +565,35 @@ Then watch kubectl kubectl get backups.longhorn.io -l backup-volume=vol-old-engi https://longhorn.github.io/longhorn-tests/manual/test-cases-to-reproduce-attach-detach-issues/attachment-detachment-issues-reproducibility/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/test-cases-to-reproduce-attach-detach-issues/attachment-detachment-issues-reproducibility/ - Prerequisite: Have an environment with just with 2 worker nodes or taint 1 out of 3 worker node to be NoExecute &amp; NoSchedule. This will serve as a constrained fallback and limited source of recovery in the event of failure. -1. Kill the engines and instance manager repeatedly Given 1 RWO and 1 RWX volume is attached to a pod. And Both the volumes have 2 replicas. And Random data is continuously being written to the volume using command dd if=/dev/urandom of=file1 count=100 bs=1M conv=fsync status=progress oflag=direct,sync + Prerequisite: Have an environment with just with 2 worker nodes or taint 1 out of 3 worker node to be NoExecute &amp; NoSchedule. This will serve as a constrained fallback and limited source of recovery in the event of failure. 1. Kill the engines and instance manager repeatedly Given 1 RWO and 1 RWX volume is attached to a pod. And Both the volumes have 2 replicas. And Random data is continuously being written to the volume using command dd if=/dev/urandom of=file1 count=100 bs=1M conv=fsync status=progress oflag=direct,sync Test CronJob For Volumes That Are Detached For A Long Time https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/delete-cronjob-for-detached-volumes/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/delete-cronjob-for-detached-volumes/ - Related issue https://github.com/longhorn/longhorn/issues/2513 -Steps Make sure the setting Allow Recurring Job While Volume Is Detached is disabled Create a volume. Attach to a node. Create a recurring backup job that run every minute. Wait for the cronjob to be scheduled a few times. Detach the volume. Verify that the CronJob get deleted. Wait 2 hours (&gt; 100 mins). Attach the volume to a node. Verify that the CronJob get created. Verify that Kubernetes schedules a run for the CronJob at the beginning of the next minute. + Related issue https://github.com/longhorn/longhorn/issues/2513 Steps Make sure the setting Allow Recurring Job While Volume Is Detached is disabled Create a volume. Attach to a node. Create a recurring backup job that run every minute. Wait for the cronjob to be scheduled a few times. Detach the volume. Verify that the CronJob get deleted. Wait 2 hours (&gt; 100 mins). Attach the volume to a node. Verify that the CronJob get created. Verify that Kubernetes schedules a run for the CronJob at the beginning of the next minute. Test CSI plugin liveness probe https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-csi-plugin-liveness-probe/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-csi-plugin-liveness-probe/ - Related discussion https://github.com/longhorn/longhorn/issues/3907 -Test CSI plugin liveness probe should recover CSI socket file Given healthy Longhorn cluster -When delete the Longhorn CSI socket file on one of the node(node-1). rm /var/lib/kubelet/plugins/driver.longhorn.io/csi.sock -Then the longhorn-csi-plugin-* pod on node-1 should be restarted. -And the csi-provisioner-* pod on node-1 should be restarted. -And the csi-resizer-* pod on node-1 should be restarted. -And the csi-snapshotter-* pod on node-1 should be restarted. -And the csi-attacher-* pod on node-1 should be restarted. + Related discussion https://github.com/longhorn/longhorn/issues/3907 Test CSI plugin liveness probe should recover CSI socket file Given healthy Longhorn cluster When delete the Longhorn CSI socket file on one of the node(node-1). rm /var/lib/kubelet/plugins/driver.longhorn.io/csi.sock Then the longhorn-csi-plugin-* pod on node-1 should be restarted. And the csi-provisioner-* pod on node-1 should be restarted. And the csi-resizer-* pod on node-1 should be restarted. And the csi-snapshotter-* pod on node-1 should be restarted. And the csi-attacher-* pod on node-1 should be restarted. Test Disable IPv6 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/disable_ipv6/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/disable_ipv6/ - Related issue https://github.com/longhorn/longhorn/issues/2136 -https://github.com/longhorn/longhorn/issues/2197 -Longhorn v1.1.1 should work with IPv6 disabled. -Scenario Install Kubernetes Disable IPv6 on all the worker nodes using the following Go to the folder /etc/default In the grub file, edit the value GRUB_CMDLINE_LINUX_DEFAULT=&#34;ipv6.disable=1&#34; Once the file is saved update by the command update-grub Reboot the node and once the node becomes active, Use the command cat /proc/cmdline to verify &#34;ipv6.disable=1&#34; is reflected in the values Deploy Longhorn and test basic use cases. + Related issue https://github.com/longhorn/longhorn/issues/2136 https://github.com/longhorn/longhorn/issues/2197 Longhorn v1.1.1 should work with IPv6 disabled. Scenario Install Kubernetes Disable IPv6 on all the worker nodes using the following Go to the folder /etc/default In the grub file, edit the value GRUB_CMDLINE_LINUX_DEFAULT=&#34;ipv6.disable=1&#34; Once the file is saved update by the command update-grub Reboot the node and once the node becomes active, Use the command cat /proc/cmdline to verify &#34;ipv6.disable=1&#34; is reflected in the values Deploy Longhorn and test basic use cases. Test engine binary recovery https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-engine-binary-recovery/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-engine-binary-recovery/ - Related issue https://github.com/longhorn/longhorn/issues/4380 -Steps Test remove engine binary on host should recover Given EngineImage custom resource deployed -&gt; kubectl -n longhorn-system get engineimage NAME STATE IMAGE REFCOUNT BUILDDATE AGE ei-b907910b deployed longhornio/longhorn-engine:master-head 0 3d23h 2m25s And engine image pods Ready are 1/1. -&gt; kubectl -n longhorn-system get pod | grep engine-image engine-image-ei-b907910b-g4kpd 1/1 Running 0 2m43s engine-image-ei-b907910b-46k6t 1/1 Running 0 2m43s engine-image-ei-b907910b-t6wnd 1/1 Running 0 2m43s When Delete engine binary on host + Related issue https://github.com/longhorn/longhorn/issues/4380 Steps Test remove engine binary on host should recover Given EngineImage custom resource deployed &gt; kubectl -n longhorn-system get engineimage NAME STATE IMAGE REFCOUNT BUILDDATE AGE ei-b907910b deployed longhornio/longhorn-engine:master-head 0 3d23h 2m25s And engine image pods Ready are 1/1. &gt; kubectl -n longhorn-system get pod | grep engine-image engine-image-ei-b907910b-g4kpd 1/1 Running 0 2m43s engine-image-ei-b907910b-46k6t 1/1 Running 0 2m43s engine-image-ei-b907910b-t6wnd 1/1 Running 0 2m43s When Delete engine binary on host Test Engine Crash During Live Upgrade @@ -786,82 +607,49 @@ Steps Test remove engine binary on host should recover Given EngineImage custom https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/test-file-sync-cancellation/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/test-file-sync-cancellation/ - Related issue https://github.com/longhorn/longhorn/issues/2416 -Test step For test convenience, manually launch the backing image manager pods: apiVersion: apps/v1 kind: DaemonSet metadata: labels: app: backing-image-manager name: backing-image-manager namespace: longhorn-system spec: selector: matchLabels: app: backing-image-manager template: metadata: labels: app: backing-image-manager spec: containers: - name: backing-image-manager image: longhornio/backing-image-manager:master imagePullPolicy: Always securityContext: privileged: true command: - backing-image-manager - --debug - daemon - --listen - 0.0.0.0:8000 readinessProbe: tcpSocket: port: 8000 volumeMounts: - name: disk-path mountPath: /data volumes: - name: disk-path hostPath: path: /var/lib/longhorn/ serviceAccountName: longhorn-service-account Download a backing image in the first pod: # alias bm=&#34;backing-image-manager backing-image&#34; # bm pull --name bi-test --uuid uuid-bi-test --download-url https://cloud-images. + Related issue https://github.com/longhorn/longhorn/issues/2416 Test step For test convenience, manually launch the backing image manager pods: apiVersion: apps/v1 kind: DaemonSet metadata: labels: app: backing-image-manager name: backing-image-manager namespace: longhorn-system spec: selector: matchLabels: app: backing-image-manager template: metadata: labels: app: backing-image-manager spec: containers: - name: backing-image-manager image: longhornio/backing-image-manager:master imagePullPolicy: Always securityContext: privileged: true command: - backing-image-manager - --debug - daemon - --listen - 0.0.0.0:8000 readinessProbe: tcpSocket: port: 8000 volumeMounts: - name: disk-path mountPath: /data volumes: - name: disk-path hostPath: path: /var/lib/longhorn/ serviceAccountName: longhorn-service-account Download a backing image in the first pod: # alias bm=&#34;backing-image-manager backing-image&#34; # bm pull --name bi-test --uuid uuid-bi-test --download-url https://cloud-images. Test filesystem trim https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-filesystem-trim/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-filesystem-trim/ - Related issue https://github.com/longhorn/longhorn/issues/836 -Case 1: Test filesystem trim during writing Given A 10G volume created. -And Volume attached to node-1. -And Make a filesystem like EXT4 or XFS for the volume. -And Mount the filesystem on a mount point. -Then Run the below shell script with the correct mount point specified: -#!/usr/bin/env bash MOUNT_POINT=${1} dd if=/dev/urandom of=/mnt/data bs=1M count=8000 sync CKSUM=`md5sum /mnt/data | awk &#39;{print $1}&#39;` for INDEX in {1..10..1}; do rm -rf ${MOUNT_POINT}/data dd if=/mnt/data of=${MOUNT_POINT}/data &amp; RAND_SLEEP_INTERVAL=$(($(($RANDOM%50))+10)) sleep ${RAND_SLEEP_INTERVAL} fstrim ${MOUNT_POINT} while [ `ps aux | grep &#34;dd if&#34; | grep -v grep | wc -l` -eq &#34;1&#34; ] do sleep 1 done CUR_CKSUM=`md5sum ${MOUNT_POINT}/data | awk &#39;{print $1}&#39;` if [ $CUR_CKSUM ! + Related issue https://github.com/longhorn/longhorn/issues/836 Case 1: Test filesystem trim during writing Given A 10G volume created. And Volume attached to node-1. And Make a filesystem like EXT4 or XFS for the volume. And Mount the filesystem on a mount point. Then Run the below shell script with the correct mount point specified: #!/usr/bin/env bash MOUNT_POINT=${1} dd if=/dev/urandom of=/mnt/data bs=1M count=8000 sync CKSUM=`md5sum /mnt/data | awk &#39;{print $1}&#39;` for INDEX in {1..10..1}; do rm -rf ${MOUNT_POINT}/data dd if=/mnt/data of=${MOUNT_POINT}/data &amp; RAND_SLEEP_INTERVAL=$(($(($RANDOM%50))+10)) sleep ${RAND_SLEEP_INTERVAL} fstrim ${MOUNT_POINT} while [ `ps aux | grep &#34;dd if&#34; | grep -v grep | wc -l` -eq &#34;1&#34; ] do sleep 1 done CUR_CKSUM=`md5sum ${MOUNT_POINT}/data | awk &#39;{print $1}&#39;` if [ $CUR_CKSUM ! Test Frontend Traffic https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/ws-traffic-flood/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/ws-traffic-flood/ - Related issue https://github.com/longhorn/longhorn/issues/2372 -Test Frontend Traffic Given 100 pvc created. -And all pvcs deployed and detached. -When monitor traffic in frontend pod with nload. -apk add nload nload Then should not see a continuing large amount of traffic when there is no operation happening. The smaller spikes are mostly coming from event resources which possibly could be enhanced later (https://github.com/longhorn/longhorn/issues/2433). + Related issue https://github.com/longhorn/longhorn/issues/2372 Test Frontend Traffic Given 100 pvc created. And all pvcs deployed and detached. When monitor traffic in frontend pod with nload. apk add nload nload Then should not see a continuing large amount of traffic when there is no operation happening. The smaller spikes are mostly coming from event resources which possibly could be enhanced later (https://github.com/longhorn/longhorn/issues/2433). Test Frontend Web-socket Data Transfer When Resource Not Updated https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/full-ws-data-tranfer-when-no-updates/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/full-ws-data-tranfer-when-no-updates/ - Related issue https://github.com/longhorn/longhorn-manager/pull/918 https://github.com/longhorn/longhorn/issues/2646 https://github.com/longhorn/longhorn/issues/2591 -Test Data Send Over Web-socket When No Resource Updated Given 1 PVC/Pod created. And the Pod is not writing to the mounted volume. -When monitor network traffic with browser inspect tool. -Then wait for 3 mins And should not see data send over web-socket when there are no updates to the resources. -Test Data Send Over Web-socket Resource Updated Given monitor network traffic with browser inspect tool. + Related issue https://github.com/longhorn/longhorn-manager/pull/918 https://github.com/longhorn/longhorn/issues/2646 https://github.com/longhorn/longhorn/issues/2591 Test Data Send Over Web-socket When No Resource Updated Given 1 PVC/Pod created. And the Pod is not writing to the mounted volume. When monitor network traffic with browser inspect tool. Then wait for 3 mins And should not see data send over web-socket when there are no updates to the resources. Test Data Send Over Web-socket Resource Updated Given monitor network traffic with browser inspect tool. Test helm on Rancher deployed Windows Cluster https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-helm-install-on-rancher-deployed-windows-cluster/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-helm-install-on-rancher-deployed-windows-cluster/ - Related issue https://github.com/longhorn/longhorn/issues/4246 -Test Install Given Rancher cluster. -And 3 new instances for the Windows cluster following Architecture Requirements. -And docker installed on the 3 Windows cluster instances. -And Disabled Private IP Address Checks for the 3 Windows cluster instances. -And Created new Custom Windows cluster with Rancher. -Select Flannel for Network Provider Enable Windows Support -And Added the 3 nodes to the Rancher Windows cluster. -Add Linux Master Node + Related issue https://github.com/longhorn/longhorn/issues/4246 Test Install Given Rancher cluster. And 3 new instances for the Windows cluster following Architecture Requirements. And docker installed on the 3 Windows cluster instances. And Disabled Private IP Address Checks for the 3 Windows cluster instances. And Created new Custom Windows cluster with Rancher. Select Flannel for Network Provider Enable Windows Support And Added the 3 nodes to the Rancher Windows cluster. Add Linux Master Node Test Helm uninstall Longhorn in different namespace https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-helm-uninstall-different-namespace/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-helm-uninstall-different-namespace/ - Related issue https://github.com/longhorn/longhorn/issues/2034 -Test Given helm install Longhorn in different namespace -When helm uninstall Longhorn -Then Longhorn should complete uninstalling. + Related issue https://github.com/longhorn/longhorn/issues/2034 Test Given helm install Longhorn in different namespace When helm uninstall Longhorn Then Longhorn should complete uninstalling. Test IM Proxy connection metrics https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-grpc-proxy/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-grpc-proxy/ - Related issue https://github.com/longhorn/longhorn/issues/2821 https://github.com/longhorn/longhorn/issues/4038 -Test gRPC proxy Given Longhorn exist in the cluster. -And Monitoring stack exist in the cluster. -When Execute longhorn_instance_manager_proxy_grpc_connection in Prometheus UI. -Then Metric data shows in Prometheus UI. -When Monitor longhorn_instance_manager_proxy_grpc_connection in Grafana UI Panel. -And Run automation regression. -Then Connections should return to 0 when tests complete. + Related issue https://github.com/longhorn/longhorn/issues/2821 https://github.com/longhorn/longhorn/issues/4038 Test gRPC proxy Given Longhorn exist in the cluster. And Monitoring stack exist in the cluster. When Execute longhorn_instance_manager_proxy_grpc_connection in Prometheus UI. Then Metric data shows in Prometheus UI. When Monitor longhorn_instance_manager_proxy_grpc_connection in Grafana UI Panel. And Run automation regression. Then Connections should return to 0 when tests complete. Test instance manager cleanup during uninstall @@ -889,69 +677,49 @@ Then Connections should return to 0 when tests complete. https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/instance-manager-streaming-connection-recovery/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/instance-manager-streaming-connection-recovery/ - Related issue https://github.com/longhorn/longhorn/issues/2561 -Test Step Given A cluster with Longhorn -And create a volume and attach it to a pod. -And exec into a longhorn manager pod and kill the connection with an engine or replica instance manager pod. The connections are instance manager pods&rsquo; IP with port 8500. -$ kl exec -it longhorn-manager-5z8zn -- bash root@longhorn-manager-5z8zn:/# ss Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 0 0 10. + Related issue https://github.com/longhorn/longhorn/issues/2561 Test Step Given A cluster with Longhorn And create a volume and attach it to a pod. And exec into a longhorn manager pod and kill the connection with an engine or replica instance manager pod. The connections are instance manager pods&rsquo; IP with port 8500. $ kl exec -it longhorn-manager-5z8zn -- bash root@longhorn-manager-5z8zn:/# ss Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 0 0 10. Test ISCSI Installation on EKS https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/iscsi_installation/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/iscsi_installation/ - This is for EKS or similar users who doesn&rsquo;t need to log into each host to install &lsquo;ISCSI&rsquo; individually. -Test steps: -Create an EKS cluster with 3 nodes. Run the following command to install iscsi on every nodes. kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/iscsi/longhorn-iscsi-installation.yaml In Longhorn Manager Repo Directory run: kubectl apply -Rf ./deploy/install/ Longhorn should be able installed successfully. Try to create a pod with a pvc: kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/examples/simple_pvc.yaml kubectl apply -f https://raw. + This is for EKS or similar users who doesn&rsquo;t need to log into each host to install &lsquo;ISCSI&rsquo; individually. Test steps: Create an EKS cluster with 3 nodes. Run the following command to install iscsi on every nodes. kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/iscsi/longhorn-iscsi-installation.yaml In Longhorn Manager Repo Directory run: kubectl apply -Rf ./deploy/install/ Longhorn should be able installed successfully. Try to create a pod with a pvc: kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/examples/simple_pvc.yaml kubectl apply -f https://raw. Test kubelet restart on a node of the cluster https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/kubelet-restart/kubelet-restart-on-a-node/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/kubelet-restart/kubelet-restart-on-a-node/ - Related issues: https://github.com/longhorn/longhorn/issues/2629 -Case 1: Kubelet restart on RKE1 multi node cluster: Create a RKE1 cluster with config of 1 etcd/control plane and 3 worker nodes. Deploy Longhorn on the cluster. Deploy prometheus monitoring app on the cluster which is using Longhorn storage class or deploy a statefulSet with Longhorn volume. Write some data into the mount point and compute the md5sum. Restart the kubelet on the node where the statefulSet or Prometheus pod is running using the command sudo docker restart kubelet Observe the volume. + Related issues: https://github.com/longhorn/longhorn/issues/2629 Case 1: Kubelet restart on RKE1 multi node cluster: Create a RKE1 cluster with config of 1 etcd/control plane and 3 worker nodes. Deploy Longhorn on the cluster. Deploy prometheus monitoring app on the cluster which is using Longhorn storage class or deploy a statefulSet with Longhorn volume. Write some data into the mount point and compute the md5sum. Restart the kubelet on the node where the statefulSet or Prometheus pod is running using the command sudo docker restart kubelet Observe the volume. Test Label-driven Recurring Job https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/label-driven-recurring-job/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/label-driven-recurring-job/ - Related issue https://github.com/longhorn/longhorn/issues/467 -Test Recurring Job Concurrency Given create snapshot recurring job with concurrency set to 2 and include snapshot recurring job default in groups. -When create volume test-job-1. -And create volume test-job-2. -And create volume test-job-3. -And create volume test-job-4. -And create volume test-job-5. -Then moniter the cron job pod log. -And should see 2 jobs created concurrently. -When update snapshot1 recurring job with concurrency set to 3. -Then moniter the cron job pod log. + Related issue https://github.com/longhorn/longhorn/issues/467 Test Recurring Job Concurrency Given create snapshot recurring job with concurrency set to 2 and include snapshot recurring job default in groups. When create volume test-job-1. And create volume test-job-2. And create volume test-job-3. And create volume test-job-4. And create volume test-job-5. Then moniter the cron job pod log. And should see 2 jobs created concurrently. When update snapshot1 recurring job with concurrency set to 3. Then moniter the cron job pod log. Test Longhorn components recovery https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/test-longhorn-component-recovery/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/test-longhorn-component-recovery/ - This is a simple test is check if all the components are recoverable. -Test data setup: Deploy Longhorn on a 3 nodes cluster. Create a volume vol-1 using Longhorn UI. Create a volume vol-2 using the Longhorn storage class. Create a volume vol-3 with backing image. Create an RWX volume vol-4. Write some data in all the volumes created and compute the md5sum. Have all the volumes in attached state. Test steps: Delete the IM-e from every volume and make sure every volume recovers. + This is a simple test is check if all the components are recoverable. Test data setup: Deploy Longhorn on a 3 nodes cluster. Create a volume vol-1 using Longhorn UI. Create a volume vol-2 using the Longhorn storage class. Create a volume vol-3 with backing image. Create an RWX volume vol-4. Write some data in all the volumes created and compute the md5sum. Have all the volumes in attached state. Test steps: Delete the IM-e from every volume and make sure every volume recovers. Test Longhorn deployment on RKE2 v1.24- with CIS-1.6 profile https://longhorn.github.io/longhorn-tests/manual/pre-release/environment/rke2-cis-1.6-profile/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/environment/rke2-cis-1.6-profile/ - Related issue This test was created in response to 2292, which used CSI-1.5. However, RKE2 generally does not recommend or encourage using CIS-1.5 in favor of CIS1.6. -Scenario Prepare 1 control plane node and 3 worker nodes. Install RKE2 v1.24 with CIS-1.6 profile on 1 control plane node. sudo su - systemctl disable firewalld # On a supporting OS. systemctl stop firewalld # On a supporting OS. yum install iscsi-initiator-utils # Or the OS equivalent. + Related issue This test was created in response to 2292, which used CSI-1.5. However, RKE2 generally does not recommend or encourage using CIS-1.5 in favor of CIS1.6. Scenario Prepare 1 control plane node and 3 worker nodes. Install RKE2 v1.24 with CIS-1.6 profile on 1 control plane node. sudo su - systemctl disable firewalld # On a supporting OS. systemctl stop firewalld # On a supporting OS. yum install iscsi-initiator-utils # Or the OS equivalent. Test Longhorn deployment on RKE2 v1.25+ with CIS-1.23 profile https://longhorn.github.io/longhorn-tests/manual/pre-release/environment/rke2-cis-1.23-profile/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/environment/rke2-cis-1.23-profile/ - Related issue This is an expansion of Test Longhorn deployment on RKE2 v1.24- with CIS-1.6 profile, which was created in response to 2292. However, later versions of RKE2 only support CIS-1.23. -Scenario Prepare 1 control plane node and 3 worker nodes. Install the latest RKE2 with CIS-1.23 profile on 1 control plane node. sudo su - systemctl disable firewalld # On a supporting OS. systemctl stop firewalld # On a supporting OS. + Related issue This is an expansion of Test Longhorn deployment on RKE2 v1.24- with CIS-1.6 profile, which was created in response to 2292. However, later versions of RKE2 only support CIS-1.23. Scenario Prepare 1 control plane node and 3 worker nodes. Install the latest RKE2 with CIS-1.23 profile on 1 control plane node. sudo su - systemctl disable firewalld # On a supporting OS. systemctl stop firewalld # On a supporting OS. Test longhorn manager NPE caused by backup creation @@ -972,26 +740,14 @@ Scenario Prepare 1 control plane node and 3 worker nodes. Install the latest RKE https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-system-backup/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-system-backup/ - Steps Given Custom resource SystemBackup (foo) exist in AWS S3, -And System backup (foo) downloaded from AWS S3. -And Custom resource SystemBackup (foo) deleted. -When Upload the system backup (foo) to AWS S3. -And Create a new custom resource SystemBackup(foo). -This needs to be done before the system backup gets synced to the cluster. -Then Should see the synced messages in the custom resource SystemBackup(foo). -Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Syncing 9m29s longhorn-system-backup-controller Syncing system backup from backup target Normal Synced 9m28s longhorn-system-backup-controller Synced system backup from backup target + Steps Given Custom resource SystemBackup (foo) exist in AWS S3, And System backup (foo) downloaded from AWS S3. And Custom resource SystemBackup (foo) deleted. When Upload the system backup (foo) to AWS S3. And Create a new custom resource SystemBackup(foo). This needs to be done before the system backup gets synced to the cluster. Then Should see the synced messages in the custom resource SystemBackup(foo). Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Syncing 9m29s longhorn-system-backup-controller Syncing system backup from backup target Normal Synced 9m28s longhorn-system-backup-controller Synced system backup from backup target Test Node Delete https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/delete-node/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/delete-node/ - Related issue https://github.com/longhorn/longhorn/issues/2186 https://github.com/longhorn/longhorn/issues/2462 -Delete Method Should verify with both of the delete methods. -Bulk Delete - This is the Delete on the Node page. Node Delete - This is the Remove Node for each node Operation drop-down list. Test Node Delete - should grey out when node not down Given node not Down. -When Try to delete any node. -Then Should see button greyed out. -Test Node Delete Given pod with pvc created. + Related issue https://github.com/longhorn/longhorn/issues/2186 https://github.com/longhorn/longhorn/issues/2462 Delete Method Should verify with both of the delete methods. Bulk Delete - This is the Delete on the Node page. Node Delete - This is the Remove Node for each node Operation drop-down list. Test Node Delete - should grey out when node not down Given node not Down. When Try to delete any node. Then Should see button greyed out. Test Node Delete Given pod with pvc created. Test node deletion @@ -1005,64 +761,42 @@ Test Node Delete Given pod with pvc created. https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/test-node-drain-policy/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/test-node-drain-policy/ - With node-drain-policy is block-if-contains-last-replica Note: Starting from v1.5.x, it is not necessary to check for the presence of longhorn-admission-webhook and longhorn-conversion-webhook. Please refer to the Longhorn issue #5590 for more details. -Starting from v1.5.x, observe that the instance-manager-r and instance-manager-e are combined into instance-manager. Ref 5208 -1. Basic unit tests 1.1 Single worker node cluster with separate master node 1.1.1 RWO volumes -Deploy Longhorn Verify that there is no PDB for csi-attacher, csi-provisioner, longhorn-admission-webhook, and longhorn-conversion-webhook Manually create a PVC (simulate the volume which has never been attached scenario) Verify that there is no PDB for csi-attacher, csi-provisioner, longhorn-admission-webhook, and longhorn-conversion-webhook because there is no attached volume Create a deployment that uses one RW0 Longhorn volume. + With node-drain-policy is block-if-contains-last-replica Note: Starting from v1.5.x, it is not necessary to check for the presence of longhorn-admission-webhook and longhorn-conversion-webhook. Please refer to the Longhorn issue #5590 for more details. Starting from v1.5.x, observe that the instance-manager-r and instance-manager-e are combined into instance-manager. Ref 5208 1. Basic unit tests 1.1 Single worker node cluster with separate master node 1.1.1 RWO volumes Deploy Longhorn Verify that there is no PDB for csi-attacher, csi-provisioner, longhorn-admission-webhook, and longhorn-conversion-webhook Manually create a PVC (simulate the volume which has never been attached scenario) Verify that there is no PDB for csi-attacher, csi-provisioner, longhorn-admission-webhook, and longhorn-conversion-webhook because there is no attached volume Create a deployment that uses one RW0 Longhorn volume. Test Node ID Change During Backing Image Creation https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-node-id-change-during-backing-image-creation/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-node-id-change-during-backing-image-creation/ - Related issue https://github.com/longhorn/longhorn/issues/4887 -Steps Given A relatively large file so that uploading it would take several minutes at least. -And Upload the file as a backing image. -And Monitor the longhorn manager pod logs. -When Add new nodes for the cluster or new disks for the existing Longhorn nodes during the upload. -Then Should see the upload success. -And Should not see error messages like below in the longhorn manager pods. + Related issue https://github.com/longhorn/longhorn/issues/4887 Steps Given A relatively large file so that uploading it would take several minutes at least. And Upload the file as a backing image. And Monitor the longhorn manager pod logs. When Add new nodes for the cluster or new disks for the existing Longhorn nodes during the upload. Then Should see the upload success. And Should not see error messages like below in the longhorn manager pods. Test Node Selector https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/test-node-selector/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/test-node-selector/ - Prepare the cluster Using Rancher RKE to create a cluster of 2 Windows worker nodes and 3 Linux worker nodes. Rancher will add the taint cattle.io/os=linux:NoSchedule to Linux nodes Kubernetes will add label kubernetes.io/os:linux to Linux nodes Test steps Repeat the following steps for each type of Longhorn installation: Rancher, Helm, Kubectl: -Follow the Longhorn document at the PR https://github.com/longhorn/website/pull/287 to install Longhorn with toleration cattle.io/os=linux:NoSchedule and node selector kubernetes.io/os:linux Verify that Longhorn get deployed successfully on the 3 Linux nodes Verify all volume basic functionalities is working ok Create a volume of 3 replica named vol-1 Add label longhorn. + Prepare the cluster Using Rancher RKE to create a cluster of 2 Windows worker nodes and 3 Linux worker nodes. Rancher will add the taint cattle.io/os=linux:NoSchedule to Linux nodes Kubernetes will add label kubernetes.io/os:linux to Linux nodes Test steps Repeat the following steps for each type of Longhorn installation: Rancher, Helm, Kubectl: Follow the Longhorn document at the PR https://github.com/longhorn/website/pull/287 to install Longhorn with toleration cattle.io/os=linux:NoSchedule and node selector kubernetes.io/os:linux Verify that Longhorn get deployed successfully on the 3 Linux nodes Verify all volume basic functionalities is working ok Create a volume of 3 replica named vol-1 Add label longhorn. Test NPE when longhorn UI deployment CR not exist https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-npe-when-longhorn-ui-deployment-not-exist/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-npe-when-longhorn-ui-deployment-not-exist/ - Related issue https://github.com/longhorn/longhorn/issues/4065 -Test Given helm install Longhorn -When delete deployment/longhorn-ui And update setting/kubernetes-cluster-autoscaler-enabled to true or false -Then longhorn-manager pods should still be Running. + Related issue https://github.com/longhorn/longhorn/issues/4065 Test Given helm install Longhorn When delete deployment/longhorn-ui And update setting/kubernetes-cluster-autoscaler-enabled to true or false Then longhorn-manager pods should still be Running. Test Online Expansion https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-online-expansion/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-online-expansion/ - Related issue https://github.com/longhorn/longhorn/issues/1674 -Test online expansion with continuous reading/writing Given Prepare a relatively large file (5Gi for example) with the checksum calculated. -And Create and attach a volume. -And Monitor the instance manager pod logs. -When Use dd to copy data from the file to the Longhorn block device. -dd if=/mnt/data of=/dev/longhorn/vol bs=1M And Do online expansion for the volume during the copying. -Then The expansion should success. The corresponding block device on the attached node is expanded. + Related issue https://github.com/longhorn/longhorn/issues/1674 Test online expansion with continuous reading/writing Given Prepare a relatively large file (5Gi for example) with the checksum calculated. And Create and attach a volume. And Monitor the instance manager pod logs. When Use dd to copy data from the file to the Longhorn block device. dd if=/mnt/data of=/dev/longhorn/vol bs=1M And Do online expansion for the volume during the copying. Then The expansion should success. The corresponding block device on the attached node is expanded. Test PVC Name and Namespace included in the volume metrics https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-pvc-name-and-namespace-included-in-volume-metrics/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-pvc-name-and-namespace-included-in-volume-metrics/ - Related issues https://github.com/longhorn/longhorn/issues/5297 https://github.com/longhorn/longhorn-manager/pull/2284 Test step Given created 2 volumes (volume-1, volume-2) -When PVC created for volume (volume-1) And attached volumes (volume-1, volume-2) -Then metrics with longhorn_volume_ prefix should include pvc=&quot;volume-1&quot; -curl -sSL http://10.0.2.212:32744/metrics | grep longhorn_volume | grep ip-10-0-2-151 | grep volume-1 longhorn_volume_actual_size_bytes{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_capacity_bytes{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 1.073741824e+09 longhorn_volume_read_iops{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_read_latency{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_read_throughput{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_robustness{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 1 longhorn_volume_state{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 2 longhorn_volume_write_iops{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_write_latency{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_write_throughput{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 And metrics with longhorn_volume_ prefix should include pvc=&quot;&quot; for (volume-2) + Related issues https://github.com/longhorn/longhorn/issues/5297 https://github.com/longhorn/longhorn-manager/pull/2284 Test step Given created 2 volumes (volume-1, volume-2) When PVC created for volume (volume-1) And attached volumes (volume-1, volume-2) Then metrics with longhorn_volume_ prefix should include pvc=&quot;volume-1&quot; curl -sSL http://10.0.2.212:32744/metrics | grep longhorn_volume | grep ip-10-0-2-151 | grep volume-1 longhorn_volume_actual_size_bytes{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_capacity_bytes{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 1.073741824e+09 longhorn_volume_read_iops{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_read_latency{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_read_throughput{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_robustness{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 1 longhorn_volume_state{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 2 longhorn_volume_write_iops{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_write_latency{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_write_throughput{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 And metrics with longhorn_volume_ prefix should include pvc=&quot;&quot; for (volume-2) Test Read Write Many Feature @@ -1076,57 +810,35 @@ curl -sSL http://10.0.2.212:32744/metrics | grep longhorn_volume | grep ip-10-0- https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-replica-disk-soft-anti-affinity/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-replica-disk-soft-anti-affinity/ - Related issue https://github.com/longhorn/longhorn/issues/3823 -Test initial behavior of global Replica Disk Soft Anti-Affinity setting Given A newly created Longhorn cluster -Then Replica Zone Disk Anti-Affinity shows as false in the UI -And the replica-soft-anti-affinity setting shows false with kubectl -Test initial behavior of global Replica Disk Soft Anti-Affinity setting after upgrade Given A newly upgraded Longhorn cluster -Then Replica Zone Disk Anti-Affinity shows as false in the UI -And the replica-soft-anti-affinity shows false with kubectl + Related issue https://github.com/longhorn/longhorn/issues/3823 Test initial behavior of global Replica Disk Soft Anti-Affinity setting Given A newly created Longhorn cluster Then Replica Zone Disk Anti-Affinity shows as false in the UI And the replica-soft-anti-affinity setting shows false with kubectl Test initial behavior of global Replica Disk Soft Anti-Affinity setting after upgrade Given A newly upgraded Longhorn cluster Then Replica Zone Disk Anti-Affinity shows as false in the UI And the replica-soft-anti-affinity shows false with kubectl Test replica scale-down warning https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-replica-scale-down-warning/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-replica-scale-down-warning/ - Related issue https://github.com/longhorn/longhorn/issues/4120 -Steps Given Replica Auto Balance set to least-effort or best-effort. -And Volume with 3 replicas created. -And Volume attached to node-1. -And Monitor node-1 manager pod events. -kubectl alpha events -n longhorn-system pod &lt;node-1 manager pod&gt; -w When Update replica count to 1. -Then Should see Normal replice delete event. -Normal Delete Engine/t1-e-6a846a7a Removed unknown replica tcp://10.42.2.94:10000 from engine And Should not see Warning unknown replica detect event. + Related issue https://github.com/longhorn/longhorn/issues/4120 Steps Given Replica Auto Balance set to least-effort or best-effort. And Volume with 3 replicas created. And Volume attached to node-1. And Monitor node-1 manager pod events. kubectl alpha events -n longhorn-system pod &lt;node-1 manager pod&gt; -w When Update replica count to 1. Then Should see Normal replice delete event. Normal Delete Engine/t1-e-6a846a7a Removed unknown replica tcp://10.42.2.94:10000 from engine And Should not see Warning unknown replica detect event. Test RWX share-mount ownership reset https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/rwx-mount-ownership-reset/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/rwx-mount-ownership-reset/ - Related issue https://github.com/longhorn/longhorn/issues/2357 -Test RWX share-mount ownership Given Setup one of cluster node to use host FQDN. -root@ip-172-30-0-139:/home/ubuntu# cat /etc/hosts 127.0.0.1 localhost 54.255.224.72 ip-172-30-0-139.lan ip-172-30-0-139 root@ip-172-30-0-139:/home/ubuntu# hostname ip-172-30-0-139 root@ip-172-30-0-139:/home/ubuntu# hostname -f ip-172-30-0-139.lan And Domain = localdomain is commented out in /etc/idmapd.conf on cluster hosts. This is to ensure localdomain is not enforce to sync between server and client. Ref: https://github.com/longhorn/website/pull/279 -root@ip-172-30-0-139:~# cat /etc/idmapd.conf [General] Verbosity = 0 Pipefs-Directory = /run/rpc_pipefs # set your own domain here, if it differs from FQDN minus hostname # Domain = localdomain [Mapping] Nobody-User = nobody Nobody-Group = nogroup And pod with rwx pvc deployed to the node with host FQDN. + Related issue https://github.com/longhorn/longhorn/issues/2357 Test RWX share-mount ownership Given Setup one of cluster node to use host FQDN. root@ip-172-30-0-139:/home/ubuntu# cat /etc/hosts 127.0.0.1 localhost 54.255.224.72 ip-172-30-0-139.lan ip-172-30-0-139 root@ip-172-30-0-139:/home/ubuntu# hostname ip-172-30-0-139 root@ip-172-30-0-139:/home/ubuntu# hostname -f ip-172-30-0-139.lan And Domain = localdomain is commented out in /etc/idmapd.conf on cluster hosts. This is to ensure localdomain is not enforce to sync between server and client. Ref: https://github.com/longhorn/website/pull/279 root@ip-172-30-0-139:~# cat /etc/idmapd.conf [General] Verbosity = 0 Pipefs-Directory = /run/rpc_pipefs # set your own domain here, if it differs from FQDN minus hostname # Domain = localdomain [Mapping] Nobody-User = nobody Nobody-Group = nogroup And pod with rwx pvc deployed to the node with host FQDN. Test S3 backupstore in a cluster sitting behind a HTTP proxy https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/test-s3-backupstore/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/test-s3-backupstore/ - Related issue: 3136 -Requirement: -Set up a stand alone Squid, HTTP web proxy To configure Squid proxy: a comment about squid config If setting up instance on AWS: a EC2 security group setting S3 with existing backups Steps: -Create credential for Backup Target $ secret_name=&#34;aws-secret-proxy&#34; $ proxy_ip=123.123.123.123 $ no_proxy_params=&#34;localhost,127.0.0.1,0.0.0.0,10.0.0.0/8,192.168.0.0/16&#34; $ kubectl create secret generic $secret_name \ --from-literal=AWS_ACCESS_KEY_ID=$AWS_ID \ --from-literal=AWS_SECRET_ACCESS_KEY=$AWS_KEY \ --from-literal=HTTP_PROXY=$proxy_ip:3128 \ --from-literal=HTTPS_PROXY=$proxy_ip:3128 \ --from-literal=NO_PROXY=$no_proxy_params \ -n longhorn-system Open Longhorn UI Click on Setting Scroll down to Backup Target Credential Secret Fill in $secret_name assigned in step 1. + Related issue: 3136 Requirement: Set up a stand alone Squid, HTTP web proxy To configure Squid proxy: a comment about squid config If setting up instance on AWS: a EC2 security group setting S3 with existing backups Steps: Create credential for Backup Target $ secret_name=&#34;aws-secret-proxy&#34; $ proxy_ip=123.123.123.123 $ no_proxy_params=&#34;localhost,127.0.0.1,0.0.0.0,10.0.0.0/8,192.168.0.0/16&#34; $ kubectl create secret generic $secret_name \ --from-literal=AWS_ACCESS_KEY_ID=$AWS_ID \ --from-literal=AWS_SECRET_ACCESS_KEY=$AWS_KEY \ --from-literal=HTTP_PROXY=$proxy_ip:3128 \ --from-literal=HTTPS_PROXY=$proxy_ip:3128 \ --from-literal=NO_PROXY=$no_proxy_params \ -n longhorn-system Open Longhorn UI Click on Setting Scroll down to Backup Target Credential Secret Fill in $secret_name assigned in step 1. Test scalability with backing image https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.3/test-scalability-with-backing-image/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.3/test-scalability-with-backing-image/ - Test step Deploy a cluster with 3 worker nodes. The recommended nodes is 4v cores CPU + 8G memory at least. -Deploy Longhorn. -Launch 10 backing images with the following YAML: -apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test1 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2 --- apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test2 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2 --- apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test3 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image. + Test step Deploy a cluster with 3 worker nodes. The recommended nodes is 4v cores CPU + 8G memory at least. Deploy Longhorn. Launch 10 backing images with the following YAML: apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test1 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2 --- apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test2 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2 --- apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test3 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image. Test Service Account mount on host @@ -1140,9 +852,7 @@ apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test1 name https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/snapshot-purge-error-handling/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/snapshot-purge-error-handling/ - Related issue https://github.com/longhorn/longhorn/issues/1895 -Longhorn v1.1.1 handles the error during snapshot purge better and reports to Longhorn-manager. -Scenario-1 Create a volume with 3 replicas and attach to a pod. Write some data into the volume and take a snapshot. Delete a replica that will result in creating a system generated snapshot. Wait for replica to finish and take another snapshot. ssh into a node and resize the latest snapshot. (e.g dd if=/dev/urandom count=50 bs=1M of=&lt;SNAPSHOT-NAME&gt;. + Related issue https://github.com/longhorn/longhorn/issues/1895 Longhorn v1.1.1 handles the error during snapshot purge better and reports to Longhorn-manager. Scenario-1 Create a volume with 3 replicas and attach to a pod. Write some data into the volume and take a snapshot. Delete a replica that will result in creating a system generated snapshot. Wait for replica to finish and take another snapshot. ssh into a node and resize the latest snapshot. (e.g dd if=/dev/urandom count=50 bs=1M of=&lt;SNAPSHOT-NAME&gt;. Test snapshot purge retry @@ -1156,35 +866,21 @@ Scenario-1 Create a volume with 3 replicas and attach to a pod. Write some data https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-metadata-file/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-metadata-file/ - Related issue https://github.com/longhorn/longhorn/issues/6997 -Test Given Longhorn installed on SUSE Linux -When generated support-bundle with description and issue URL -Then issuedescription has the description in the metadata.yaml -And issueurl has the issue URL in the metadata.yaml + Related issue https://github.com/longhorn/longhorn/issues/6997 Test Given Longhorn installed on SUSE Linux When generated support-bundle with description and issue URL Then issuedescription has the description in the metadata.yaml And issueurl has the issue URL in the metadata.yaml Test Support Bundle Should Include Kubelet Log When On K3s Cluster https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-kubelet-log-for-k3s/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-kubelet-log-for-k3s/ - Related issue https://github.com/longhorn/longhorn/issues/7121 -Test Given Longhorn installed on K3s cluster -When generated support-bundle -Then should have worker node kubelet logs in k3s-agent-service.log -And should have control-plan node kubelet log in k3s-service.log (if Longhorn is deployed on control-plan node) + Related issue https://github.com/longhorn/longhorn/issues/7121 Test Given Longhorn installed on K3s cluster When generated support-bundle Then should have worker node kubelet logs in k3s-agent-service.log And should have control-plan node kubelet log in k3s-service.log (if Longhorn is deployed on control-plan node) Test Support Bundle Syslog Paths https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-syslog-paths/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-syslog-paths/ - Related issue https://github.com/longhorn/longhorn/issues/6544 -Test /var/log/messages Given Longhorn installed on SUSE Linux -When generated support-bundle -And syslog exists in the messages file -Test /var/log/syslog Given Longhorn installed on Ubuntu Linux -When generated support-bundle -And syslog exists in the syslog file + Related issue https://github.com/longhorn/longhorn/issues/6544 Test /var/log/messages Given Longhorn installed on SUSE Linux When generated support-bundle And syslog exists in the messages file Test /var/log/syslog Given Longhorn installed on Ubuntu Linux When generated support-bundle And syslog exists in the syslog file Test system upgrade with a new storage class being default @@ -1205,21 +901,14 @@ And syslog exists in the syslog file https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/system-upgrade-with-deprecated-cpu-setting/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/system-upgrade-with-deprecated-cpu-setting/ - Related issue https://github.com/longhorn/longhorn/issues/2207 -Test step Deploy a cluster that each node has different CPUs. Launch Longhorn v1.1.0. Deploy some workloads using Longhorn volumes. Upgrade to the latest Longhorn version. Validate: all workloads work fine and no instance manager pod crash during the upgrade. The fields node.Spec.EngineManagerCPURequest and node.Spec.ReplicaManagerCPURequest of each node are the same as the setting Guaranteed Engine CPU value in the old version * 1000. The old setting Guaranteed Engine CPU is deprecated with an empty value. + Related issue https://github.com/longhorn/longhorn/issues/2207 Test step Deploy a cluster that each node has different CPUs. Launch Longhorn v1.1.0. Deploy some workloads using Longhorn volumes. Upgrade to the latest Longhorn version. Validate: all workloads work fine and no instance manager pod crash during the upgrade. The fields node.Spec.EngineManagerCPURequest and node.Spec.ReplicaManagerCPURequest of each node are the same as the setting Guaranteed Engine CPU value in the old version * 1000. The old setting Guaranteed Engine CPU is deprecated with an empty value. Test the trim related option update for old volumes https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.1/test-the-trim-related-option-update-for-old-volumes/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.1/test-the-trim-related-option-update-for-old-volumes/ - Related issue https://github.com/longhorn/longhorn/issues/5218 -Test step Given Deploy Longhorn v1.3.2 -And Created and attached a volume. -And Upgrade Longhorn to the latest. -And Do live upgrade for the volume. (The 1st volume using the latest engine image but running in the old instance manager.) -And Created and attached a volume with the v1.3.2 engine image. (The 2nd volume using the old engine image but running in the new instance manager.) -When Try to update volume. + Related issue https://github.com/longhorn/longhorn/issues/5218 Test step Given Deploy Longhorn v1.3.2 And Created and attached a volume. And Upgrade Longhorn to the latest. And Do live upgrade for the volume. (The 1st volume using the latest engine image but running in the old instance manager.) And Created and attached a volume with the v1.3.2 engine image. (The 2nd volume using the old engine image but running in the new instance manager.) When Try to update volume. Test timeout on loss of network connectivity @@ -1240,51 +929,35 @@ When Try to update volume. https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/uninstallation/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/uninstallation/ - Stability of uninstallation Launch Longhorn system. -Use scripts to continuously create then delete multiple DaemonSets. -e.g., putting the following python test into the manager integration test directory and run it: from common import get_apps_api_client # NOQA def test_uninstall_script(): apps_api = get_apps_api_client() while True: for i in range(10): name = &#34;ds-&#34; + str(i) try: ds = apps_api.read_namespaced_daemon_set(name, &#34;default&#34;) if ds.status.number_ready == ds.status.number_ready: apps_api.delete_namespaced_daemon_set(name, &#34;default&#34;) except Exception: apps_api.create_namespaced_daemon_set( &#34;default&#34;, ds_manifest(name)) def ds_manifest(name): return { &#39;apiVersion&#39;: &#39;apps/v1&#39;, &#39;kind&#39;: &#39;DaemonSet&#39;, &#39;metadata&#39;: { &#39;name&#39;: name }, &#39;spec&#39;: { &#39;selector&#39;: { &#39;matchLabels&#39;: { &#39;app&#39;: name } }, &#39;template&#39;: { &#39;metadata&#39;: { &#39;labels&#39;: { &#39;app&#39;: name } }, &#39;spec&#39;: { &#39;terminationGracePeriodSeconds&#39;: 10, &#39;containers&#39;: [{ &#39;image&#39;: &#39;busybox&#39;, &#39;imagePullPolicy&#39;: &#39;IfNotPresent&#39;, &#39;name&#39;: &#39;sleep&#39;, &#39;args&#39;: [ &#39;/bin/sh&#39;, &#39;-c&#39;, &#39;while true;do date;sleep 5; done&#39; ], }] } }, } } Start to uninstall longhorn. + Stability of uninstallation Launch Longhorn system. Use scripts to continuously create then delete multiple DaemonSets. e.g., putting the following python test into the manager integration test directory and run it: from common import get_apps_api_client # NOQA def test_uninstall_script(): apps_api = get_apps_api_client() while True: for i in range(10): name = &#34;ds-&#34; + str(i) try: ds = apps_api.read_namespaced_daemon_set(name, &#34;default&#34;) if ds.status.number_ready == ds.status.number_ready: apps_api.delete_namespaced_daemon_set(name, &#34;default&#34;) except Exception: apps_api.create_namespaced_daemon_set( &#34;default&#34;, ds_manifest(name)) def ds_manifest(name): return { &#39;apiVersion&#39;: &#39;apps/v1&#39;, &#39;kind&#39;: &#39;DaemonSet&#39;, &#39;metadata&#39;: { &#39;name&#39;: name }, &#39;spec&#39;: { &#39;selector&#39;: { &#39;matchLabels&#39;: { &#39;app&#39;: name } }, &#39;template&#39;: { &#39;metadata&#39;: { &#39;labels&#39;: { &#39;app&#39;: name } }, &#39;spec&#39;: { &#39;terminationGracePeriodSeconds&#39;: 10, &#39;containers&#39;: [{ &#39;image&#39;: &#39;busybox&#39;, &#39;imagePullPolicy&#39;: &#39;IfNotPresent&#39;, &#39;name&#39;: &#39;sleep&#39;, &#39;args&#39;: [ &#39;/bin/sh&#39;, &#39;-c&#39;, &#39;while true;do date;sleep 5; done&#39; ], }] } }, } } Start to uninstall longhorn. Test upgrade for migrated Longhorn on Rancher https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-upgrade-for-migrated-longhorn/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-upgrade-for-migrated-longhorn/ - Related discussion https://github.com/longhorn/longhorn/discussions/4198 -Context: since few customers used our broken chart longhorn 100.2.1+up1.3.1 on Rancher (Now fixed) with the workaround. We would like to verify the future upgrade path for those customers. -Steps Set up a cluster of Kubernetes 1.20. Adding this repo to the apps section in new rancher UI repo: https://github.com/PhanLe1010/charts.git branch: release-v2.6-longhorn-1.3.1. Access old rancher UI by navigating to &lt;your-rancher-url&gt;/g. Install Longhorn 1.0.2. Create/attach some volumes. Create a few recurring snapshot/backup job that run every minutes. + Related discussion https://github.com/longhorn/longhorn/discussions/4198 Context: since few customers used our broken chart longhorn 100.2.1+up1.3.1 on Rancher (Now fixed) with the workaround. We would like to verify the future upgrade path for those customers. Steps Set up a cluster of Kubernetes 1.20. Adding this repo to the apps section in new rancher UI repo: https://github.com/PhanLe1010/charts.git branch: release-v2.6-longhorn-1.3.1. Access old rancher UI by navigating to &lt;your-rancher-url&gt;/g. Install Longhorn 1.0.2. Create/attach some volumes. Create a few recurring snapshot/backup job that run every minutes. Test upgrade responder collecting extra info https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.5.0/test-upgrade-responder-collect-extra-info/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.5.0/test-upgrade-responder-collect-extra-info/ - Related issue https://github.com/longhorn/longhorn/issues/5235 -Test step Given Patch build and deploy Longhorn. -diff --git a/controller/setting_controller.go b/controller/setting_controller.go index de77b7246..ac6263ac5 100644 --- a/controller/setting_controller.go +++ b/controller/setting_controller.go @@ -49,7 +49,7 @@ const ( var ( upgradeCheckInterval = time.Hour settingControllerResyncPeriod = time.Hour - checkUpgradeURL = &#34;https://longhorn-upgrade-responder.rancher.io/v1/checkupgrade&#34; + checkUpgradeURL = &#34;http://longhorn-upgrade-responder.default.svc.cluster.local:8314/v1/checkupgrade&#34; ) type SettingController struct { Match the checkUpgradeURL with the application name: http://&lt;APP_NAME&gt;-upgrade-responder.default.svc.cluster.local:8314/v1/checkupgrade -And Deploy upgrade responder stack. -When Wait 1~2 hours for collection data to send to the influxDB database. + Related issue https://github.com/longhorn/longhorn/issues/5235 Test step Given Patch build and deploy Longhorn. diff --git a/controller/setting_controller.go b/controller/setting_controller.go index de77b7246..ac6263ac5 100644 --- a/controller/setting_controller.go +++ b/controller/setting_controller.go @@ -49,7 +49,7 @@ const ( var ( upgradeCheckInterval = time.Hour settingControllerResyncPeriod = time.Hour - checkUpgradeURL = &#34;https://longhorn-upgrade-responder.rancher.io/v1/checkupgrade&#34; + checkUpgradeURL = &#34;http://longhorn-upgrade-responder.default.svc.cluster.local:8314/v1/checkupgrade&#34; ) type SettingController struct { Match the checkUpgradeURL with the application name: http://&lt;APP_NAME&gt;-upgrade-responder.default.svc.cluster.local:8314/v1/checkupgrade And Deploy upgrade responder stack. When Wait 1~2 hours for collection data to send to the influxDB database. Test Version Bump of Kubernetes, API version group, CSI component's dependency version https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/test_version_bump/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/test_version_bump/ - GitHub issue: https://github.com/longhorn/longhorn/issues/2757 -Test with specific Kubernetes version For each Kubernetes version (1.18, 1.19, 1.20, 1.21, 1.22), test basic functionalities of Longhorn v1.2.0 (create/attach/detach/delete volume/backup/snapshot using yaml/UI) Test Kubernetes and Longhorn upgrade Deploy K3s v1.21 Deploy Longhorn v1.1.2 Create some workload pods using Longhorn volumes Upgrade Longhorn to v1.2.0 Verify that everything is OK Upgrade K3s to v1.22 Verify that everything is OK Retest the Upgrade Lease Lock We remove the client-go patch https://github. + GitHub issue: https://github.com/longhorn/longhorn/issues/2757 Test with specific Kubernetes version For each Kubernetes version (1.18, 1.19, 1.20, 1.21, 1.22), test basic functionalities of Longhorn v1.2.0 (create/attach/detach/delete volume/backup/snapshot using yaml/UI) Test Kubernetes and Longhorn upgrade Deploy K3s v1.21 Deploy Longhorn v1.1.2 Create some workload pods using Longhorn volumes Upgrade Longhorn to v1.2.0 Verify that everything is OK Upgrade K3s to v1.22 Verify that everything is OK Retest the Upgrade Lease Lock We remove the client-go patch https://github. Test Volume Replica Zone Soft Anti-Affinity Setting https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.5.0/test-the-volume-replica-scheduling/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.5.0/test-the-volume-replica-scheduling/ - Related issue https://github.com/longhorn/longhorn/issues/5358 -Test step - Enable Volume Replica Zone Soft Anti-Affinity Setting Given EKS Cluster with 3 nodes across 2 AWS zones (zone#1, zone#2) -And Deploy Longhorn v1.5.0 -And Disable global replica zone anti-affinity -And Create a volume with 2 replicas, replicaZoneSoftAntiAffinity=enabled and attach it to a node. -When Scale volume replicas to 3 -Then New replica should be scheduled -And No error messages in the longhorn manager pod logs. + Related issue https://github.com/longhorn/longhorn/issues/5358 Test step - Enable Volume Replica Zone Soft Anti-Affinity Setting Given EKS Cluster with 3 nodes across 2 AWS zones (zone#1, zone#2) And Deploy Longhorn v1.5.0 And Disable global replica zone anti-affinity And Create a volume with 2 replicas, replicaZoneSoftAntiAffinity=enabled and attach it to a node. When Scale volume replicas to 3 Then New replica should be scheduled And No error messages in the longhorn manager pod logs. Testing ext4 with custom fs params1 (no 64bit, no metadata_csum) @@ -1333,47 +1006,35 @@ And No error messages in the longhorn manager pod logs. https://longhorn.github.io/longhorn-tests/manual/rancher-integration/upgrade-using-rancher-ui/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/upgrade-using-rancher-ui/ - Note: Longhorn version v1.3.x doesn&rsquo;t support Kubernetes v1.25 and onwards -Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Upgrade single node cluster using Rancher UI - RKE2 cluster Given Single node RKE2 cluster provisioned in Rancher with K8s prior version with Longhorn installed -AND few RWO and RWX volumes attached with node/pod exists -AND 1 RWO and 1 RWX volumes unattached -AND 1 RWO volume with 50 Gi data + Note: Longhorn version v1.3.x doesn&rsquo;t support Kubernetes v1.25 and onwards Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Upgrade single node cluster using Rancher UI - RKE2 cluster Given Single node RKE2 cluster provisioned in Rancher with K8s prior version with Longhorn installed AND few RWO and RWX volumes attached with node/pod exists AND 1 RWO and 1 RWX volumes unattached AND 1 RWO volume with 50 Gi data Upgrade Kubernetes using SUC https://longhorn.github.io/longhorn-tests/manual/rancher-integration/upgrade-using-suc/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/upgrade-using-suc/ - Note: Longhorn version v1.3.x doesn&rsquo;t support Kubernetes v1.25 and onwards -Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Upgrade multi node cluster using SUC - K3s cluster Given Multi node (1 master and 3 worker) K3s cluster (not provisioned by Rancher) with K3s prior version with Longhorn installed -AND System Upgrade Controller deployed -AND few RWO and RWX volumes attached with node/pod exists -AND 1 RWO and 1 RWX volumes unattached + Note: Longhorn version v1.3.x doesn&rsquo;t support Kubernetes v1.25 and onwards Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Upgrade multi node cluster using SUC - K3s cluster Given Multi node (1 master and 3 worker) K3s cluster (not provisioned by Rancher) with K3s prior version with Longhorn installed AND System Upgrade Controller deployed AND few RWO and RWX volumes attached with node/pod exists AND 1 RWO and 1 RWX volumes unattached Upgrade Lease Lock https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.2/upgrade-lease-lock/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.2/upgrade-lease-lock/ - The time it takes between the Longhorn Manager starting up and the upgrade completing for that Longhorn Manager can be used to determine if the upgrade lock was released correctly: -Create a fresh Longhorn installation or delete all of the Longhorn Manager Pods in the existing installation. Check the logs for the Longhorn Manager Pods and note the timestamps for the first line in the log and the timestamp for when the upgrade has completed. + The time it takes between the Longhorn Manager starting up and the upgrade completing for that Longhorn Manager can be used to determine if the upgrade lock was released correctly: Create a fresh Longhorn installation or delete all of the Longhorn Manager Pods in the existing installation. Check the logs for the Longhorn Manager Pods and note the timestamps for the first line in the log and the timestamp for when the upgrade has completed. Upgrade Longhorn with modified Storage Class https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/upgrade_with_modified_storageclass/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/upgrade_with_modified_storageclass/ - Intro Longhorn can be upgraded with modified Storage Class. -Related Issue https://github.com/longhorn/longhorn/issues/1527 -Test steps: Kubectl apply -f Install Longhorn v1.0.2 kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.0.2/deploy/longhorn.yaml Create a statefulset using longhorn storageclass for PVCs. Set the scale to 1. Observe that there is a workload pod (pod-1) is using 1 volume (vol-1) with 3 replicas. In Longhorn repo, on master branch. Modify numberOfReplicas: &quot;1&quot; in https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml. Upgrade Longhorn to master by running kubectl apply -f https://raw. + Intro Longhorn can be upgraded with modified Storage Class. Related Issue https://github.com/longhorn/longhorn/issues/1527 Test steps: Kubectl apply -f Install Longhorn v1.0.2 kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.0.2/deploy/longhorn.yaml Create a statefulset using longhorn storageclass for PVCs. Set the scale to 1. Observe that there is a workload pod (pod-1) is using 1 volume (vol-1) with 3 replicas. In Longhorn repo, on master branch. Modify numberOfReplicas: &quot;1&quot; in https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml. Upgrade Longhorn to master by running kubectl apply -f https://raw. Volume Deletion UI Warnings https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/ui-volume-deletion/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/ui-volume-deletion/ - A number of cases need to be manually tested in longhorn-ui. To test these cases, create the Volume with the specified conditions in each case, and then try to delete it. What is observed should match what is described in the test case: -A regular Volume. Only the default deletion prompt should show up asking to confirm deletion. A Volume with a Persistent Volume. The deletion prompt should tell the user that there is a Persistent Volume that will be deleted along with the Volume. + A number of cases need to be manually tested in longhorn-ui. To test these cases, create the Volume with the specified conditions in each case, and then try to delete it. What is observed should match what is described in the test case: A regular Volume. Only the default deletion prompt should show up asking to confirm deletion. A Volume with a Persistent Volume. The deletion prompt should tell the user that there is a Persistent Volume that will be deleted along with the Volume. diff --git a/integration/test_zone.html b/integration/test_zone.html index 62231545f2..2cd7f9e47a 100644 --- a/integration/test_zone.html +++ b/integration/test_zone.html @@ -36,6 +36,7 @@

Module tests.test_zone

from common import pvc, pod # NOQA from common import volume_name # NOQA +from common import cleanup_node_disks from common import get_self_host_id from common import create_and_wait_pod @@ -531,6 +532,84 @@

Module tests.test_zone

assert z3_r_count == 2 +def test_replica_auto_balance_when_disabled_disk_scheduling_in_zone(client, core_api, volume_name): # NOQA + """ + Scenario: replica auto-balance when disk scheduling is disabled on nodes + in a zone. + + Issue: https://github.com/longhorn/longhorn/issues/6508 + + Given `replica-soft-anti-affinity` setting is `true`. + And node-1 is in zone-1. + node-2 is in zone-2. + node-3 is in zone-3. + And disk scheduling is disabled on node-3. + And create a volume with 3 replicas. + And attach the volume to test pod node. + And 3 replicas running in zone-1 and zone-2. + 0 replicas running in zone-3. + + When set `replica-auto-balance` to `best-effort`. + + Then 3 replicas running in zone-1 and zone-2. + 0 replicas running in zone-3. + And replica count remains stable across zones and nodes. + """ + # Set `replica-soft-anti-affinity` to `true`. + update_setting(client, SETTING_REPLICA_NODE_SOFT_ANTI_AFFINITY, "true") + + # Assign nodes to respective zones + node1, node2, node3 = client.list_node() + set_k8s_node_zone_label(core_api, node1.name, ZONE1) + set_k8s_node_zone_label(core_api, node2.name, ZONE2) + set_k8s_node_zone_label(core_api, node3.name, ZONE3) + wait_longhorn_node_zone_updated(client) + + # Disable disk scheduling on node 3 + cleanup_node_disks(client, node3.name) + + # Create a volume with 3 replicas + num_of_replicas = 3 + volume = client.create_volume(name=volume_name, + numberOfReplicas=num_of_replicas) + + # Wait for the volume to detach and attach it to the test pod node + volume = wait_for_volume_detached(client, volume_name) + volume.attach(hostId=get_self_host_id()) + + # Define a function to assert replica count + def assert_replica_count(is_stable=False): + for _ in range(RETRY_COUNTS): + time.sleep(RETRY_INTERVAL) + + zone3_replica_count = get_zone_replica_count( + client, volume_name, ZONE3, chk_running=True) + assert zone3_replica_count == 0 + + total_replica_count = \ + get_zone_replica_count( + client, volume_name, ZONE1, chk_running=True) + \ + get_zone_replica_count( + client, volume_name, ZONE2, chk_running=True) + + if is_stable: + assert total_replica_count == num_of_replicas + elif total_replica_count == num_of_replicas: + break + + assert total_replica_count == 3 + + # Perform the initial assertion to ensure the replica count is as expected + assert_replica_count() + + # Update the replica-auto-balance setting to `best-effort` + update_setting(client, SETTING_REPLICA_AUTO_BALANCE, "best-effort") + + # Perform the final assertion to ensure the replica count is as expected, + # and stable after the setting update + assert_replica_count(is_stable=True) + + def test_replica_auto_balance_when_replica_on_unschedulable_node(client, core_api, volume_name, request): # NOQA """ Scenario: replica auto-balance when replica already running on @@ -1322,6 +1401,108 @@

Functions

assert check_z2_r_count == z2_r_count +
+def test_replica_auto_balance_when_disabled_disk_scheduling_in_zone(client, core_api, volume_name) +
+
+

Scenario: replica auto-balance when disk scheduling is disabled on nodes +in a zone.

+

Issue: https://github.com/longhorn/longhorn/issues/6508

+

Given replica-soft-anti-affinity setting is true. +And node-1 is in zone-1. +node-2 is in zone-2. +node-3 is in zone-3. +And disk scheduling is disabled on node-3. +And create a volume with 3 replicas. +And attach the volume to test pod node. +And 3 replicas running in zone-1 and zone-2. +0 replicas running in zone-3.

+

When set replica-auto-balance to best-effort.

+

Then 3 replicas running in zone-1 and zone-2. +0 replicas running in zone-3. +And replica count remains stable across zones and nodes.

+
+ +Expand source code + +
def test_replica_auto_balance_when_disabled_disk_scheduling_in_zone(client, core_api, volume_name):  # NOQA
+    """
+    Scenario: replica auto-balance when disk scheduling is disabled on nodes
+              in a zone.
+
+    Issue: https://github.com/longhorn/longhorn/issues/6508
+
+    Given `replica-soft-anti-affinity` setting is `true`.
+    And node-1 is in zone-1.
+        node-2 is in zone-2.
+        node-3 is in zone-3.
+    And disk scheduling is disabled on node-3.
+    And create a volume with 3 replicas.
+    And attach the volume to test pod node.
+    And 3 replicas running in zone-1 and zone-2.
+        0 replicas running in zone-3.
+
+    When set `replica-auto-balance` to `best-effort`.
+
+    Then 3 replicas running in zone-1 and zone-2.
+         0 replicas running in zone-3.
+    And replica count remains stable across zones and nodes.
+    """
+    # Set `replica-soft-anti-affinity` to `true`.
+    update_setting(client, SETTING_REPLICA_NODE_SOFT_ANTI_AFFINITY, "true")
+
+    # Assign nodes to respective zones
+    node1, node2, node3 = client.list_node()
+    set_k8s_node_zone_label(core_api, node1.name, ZONE1)
+    set_k8s_node_zone_label(core_api, node2.name, ZONE2)
+    set_k8s_node_zone_label(core_api, node3.name, ZONE3)
+    wait_longhorn_node_zone_updated(client)
+
+    # Disable disk scheduling on node 3
+    cleanup_node_disks(client, node3.name)
+
+    # Create a volume with 3 replicas
+    num_of_replicas = 3
+    volume = client.create_volume(name=volume_name,
+                                  numberOfReplicas=num_of_replicas)
+
+    # Wait for the volume to detach and attach it to the test pod node
+    volume = wait_for_volume_detached(client, volume_name)
+    volume.attach(hostId=get_self_host_id())
+
+    # Define a function to assert replica count
+    def assert_replica_count(is_stable=False):
+        for _ in range(RETRY_COUNTS):
+            time.sleep(RETRY_INTERVAL)
+
+            zone3_replica_count = get_zone_replica_count(
+                client, volume_name, ZONE3, chk_running=True)
+            assert zone3_replica_count == 0
+
+            total_replica_count = \
+                get_zone_replica_count(
+                    client, volume_name, ZONE1, chk_running=True) + \
+                get_zone_replica_count(
+                    client, volume_name, ZONE2, chk_running=True)
+
+            if is_stable:
+                assert total_replica_count == num_of_replicas
+            elif total_replica_count == num_of_replicas:
+                break
+
+        assert total_replica_count == 3
+
+    # Perform the initial assertion to ensure the replica count is as expected
+    assert_replica_count()
+
+    # Update the replica-auto-balance setting to `best-effort`
+    update_setting(client, SETTING_REPLICA_AUTO_BALANCE, "best-effort")
+
+    # Perform the final assertion to ensure the replica count is as expected,
+    # and stable after the setting update
+    assert_replica_count(is_stable=True)
+
+
def test_replica_auto_balance_when_replica_on_unschedulable_node(client, core_api, volume_name, request)
@@ -2366,6 +2547,7 @@

Index

  • k8s_node_zone_tags
  • test_replica_auto_balance_node_duplicates_in_multiple_zones
  • test_replica_auto_balance_should_respect_node_selector
  • +
  • test_replica_auto_balance_when_disabled_disk_scheduling_in_zone
  • test_replica_auto_balance_when_replica_on_unschedulable_node
  • test_replica_auto_balance_zone_best_effort
  • test_replica_auto_balance_zone_best_effort_with_data_locality
  • diff --git a/manual/functional-test-cases/index.xml b/manual/functional-test-cases/index.xml index 8e8ac0a73d..d761b3ead2 100644 --- a/manual/functional-test-cases/index.xml +++ b/manual/functional-test-cases/index.xml @@ -12,120 +12,63 @@ https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/deployment/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/deployment/ - Installation Longhorn v1.1.2 and above - Support Kubernetes 1.18+ -Longhorn v1.0.0 to v1.1.1 - Support Kubernetes 1.14+. Default 1.16+ -Install using Rancher Apps &amp; MarketPlace App (Default) -Install using Helm chart from https://github.com/longhorn/longhorn/tree/master/chart -Install using YAML from https://github.com/longhorn/longhorn/blob/master/deploy/longhorn.yaml -Note: Longhorn UI can scale to multiple instances for HA purposes. -Uninstallation Make sure all the CRDs and other resources are cleaned up, following the uninstallation instruction. https://longhorn.io/docs/1.2.2/deploy/uninstall/ -Customizable Default Settings https://longhorn.io/docs/1.2.2/references/settings/ + Installation Longhorn v1.1.2 and above - Support Kubernetes 1.18+ Longhorn v1.0.0 to v1.1.1 - Support Kubernetes 1.14+. Default 1.16+ Install using Rancher Apps &amp; MarketPlace App (Default) Install using Helm chart from https://github.com/longhorn/longhorn/tree/master/chart Install using YAML from https://github.com/longhorn/longhorn/blob/master/deploy/longhorn.yaml Note: Longhorn UI can scale to multiple instances for HA purposes. Uninstallation Make sure all the CRDs and other resources are cleaned up, following the uninstallation instruction. https://longhorn.io/docs/1.2.2/deploy/uninstall/ Customizable Default Settings https://longhorn.io/docs/1.2.2/references/settings/ 2. UI https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/ui/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/ui/ - Accessibility of Longhorn UI # Test Case Test Instructions 1. Access Longhorn UI using rancher proxy 1. Create a cluster (3 worker nodes and 1 etcd/control plane) in rancher, Go to the default project. -2. Go to App, Click the launch app. -3. Select longhorn. -4. Select Rancher-Proxy under the Longhorn UI service. -5. Once the app is deployed successfully, click the /index.html link appears in App page. -6. The page should redirect to longhorn UI - https://rancher/k8s/clusters/c-aaaa/api/v1/namespaces/longhorn-system/services/http:longhorn-frontend:80/proxy/#/dashboard + Accessibility of Longhorn UI # Test Case Test Instructions 1. Access Longhorn UI using rancher proxy 1. Create a cluster (3 worker nodes and 1 etcd/control plane) in rancher, Go to the default project. 2. Go to App, Click the launch app. 3. Select longhorn. 4. Select Rancher-Proxy under the Longhorn UI service. 5. Once the app is deployed successfully, click the /index.html link appears in App page. 6. The page should redirect to longhorn UI - https://rancher/k8s/clusters/c-aaaa/api/v1/namespaces/longhorn-system/services/http:longhorn-frontend:80/proxy/#/dashboard 3. Volume https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/volume/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/volume/ - Test cases for Volume # Test Case Test Instructions Expected Results 1 Check volume Details Prerequisite: -* Longhorn Nodes has node tags -* Node Disks has disk tags -* Backup target is set to NFS server, or S3 compatible target -1. Create a workload using Longhorn volume -2. Check volume details page -3. Create volume backup * Volume Details -* State should be Attached -* Health should be healthy -* Frontend should be Block Device + Test cases for Volume # Test Case Test Instructions Expected Results 1 Check volume Details Prerequisite: * Longhorn Nodes has node tags * Node Disks has disk tags * Backup target is set to NFS server, or S3 compatible target 1. Create a workload using Longhorn volume 2. Check volume details page 3. Create volume backup * Volume Details * State should be Attached * Health should be healthy * Frontend should be Block Device 5. Kubernetes https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/kubernetes/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/kubernetes/ - Dynamic provisioning with StorageClass Can create and use volume using StorageClass -Can create a new StorageClass use new parameters and it will take effect on the volume created by the storage class. -If the PV reclaim policy is delete, once PVC and PV are deleted, Longhorn volume should be deleted. -Static provisioning using Longhorn created PV/PVC PVC can be used by the new workload -Delete the PVC will not result in PV deletion + Dynamic provisioning with StorageClass Can create and use volume using StorageClass Can create a new StorageClass use new parameters and it will take effect on the volume created by the storage class. If the PV reclaim policy is delete, once PVC and PV are deleted, Longhorn volume should be deleted. Static provisioning using Longhorn created PV/PVC PVC can be used by the new workload Delete the PVC will not result in PV deletion 6. Backup https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/backup/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/backup/ - Automation Tests # Test name Description tag 1 test_backup Test basic backup -Setup: -1. Create a volume and attach to the current node -2. Run the test for all the available backupstores. -Steps: -1. Create a backup of volume -2. Restore the backup to a new volume -3. Attach the new volume and make sure the data is the same as the old one -4. Detach the volume and delete the backup. + Automation Tests # Test name Description tag 1 test_backup Test basic backup Setup: 1. Create a volume and attach to the current node 2. Run the test for all the available backupstores. Steps: 1. Create a backup of volume 2. Restore the backup to a new volume 3. Attach the new volume and make sure the data is the same as the old one 4. Detach the volume and delete the backup. 7. Node https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/node/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/node/ - UI specific test cases # Test Case Test Instructions Expected Results 1 Storage details * Prerequisites -* Longhorn Installed -1. Verify the allocated/used storage show the right data in node details page. -2. Create a volume of 20 GB and attach to a pod and verify the storage allocated/used is shown correctly. Without any volume, allocated should be 0 and on creating new volume it should be updated as per volume present. + UI specific test cases # Test Case Test Instructions Expected Results 1 Storage details * Prerequisites * Longhorn Installed 1. Verify the allocated/used storage show the right data in node details page. 2. Create a volume of 20 GB and attach to a pod and verify the storage allocated/used is shown correctly. Without any volume, allocated should be 0 and on creating new volume it should be updated as per volume present. 8. Scheduling https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/scheduling/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/scheduling/ - Manual Test Test name Prerequisite Expectation EKS across zone scheduling Prerequisite: -* EKS Cluster with 3 nodes across two AWS zones (zone#1, zone#2) -1. Create a volume with 2 replicas, and attach it to a node. -2. Delete a replica scheduled to each zone, repeat it few times -3. Scale volume replicas = 3 -4. Scale volume replicas to 4 * Volume replicas should be scheduled one per AWS zone + Manual Test Test name Prerequisite Expectation EKS across zone scheduling Prerequisite: * EKS Cluster with 3 nodes across two AWS zones (zone#1, zone#2) 1. Create a volume with 2 replicas, and attach it to a node. 2. Delete a replica scheduled to each zone, repeat it few times 3. Scale volume replicas = 3 4. Scale volume replicas to 4 * Volume replicas should be scheduled one per AWS zone 9. Upgrade https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/upgrade/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/upgrade/ - # Test name Description 1 Higher version of Longhorn engine and lower version of volume Test Longhorn upgrade -1. Create a volume, generate and write data into the volume. -2. Keep the volume attached, then upgrade Longhorn system. -3. Write data in volume. -4. Take snapshot#1. Compute the checksum#1 -5. Write data to volume. Compute the checksum#2 -6. Take backup -7. Revert to snapshot#1 -8. Restore the backup. 2 Restore the backup taken with older engine version 1. + # Test name Description 1 Higher version of Longhorn engine and lower version of volume Test Longhorn upgrade 1. Create a volume, generate and write data into the volume. 2. Keep the volume attached, then upgrade Longhorn system. 3. Write data in volume. 4. Take snapshot#1. Compute the checksum#1 5. Write data to volume. Compute the checksum#2 6. Take backup 7. Revert to snapshot#1 8. Restore the backup. 2 Restore the backup taken with older engine version 1. Monitoring https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/monitoring/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/functional-test-cases/monitoring/ - Prometheus Support test cases Install the Prometheus Operator (include a role and service account for it). For example:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-operator namespace: default roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-operator subjects: -- kind: ServiceAccount name: prometheus-operator namespace: default -&ndash; apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-operator namespace: default rules: -- apiGroups: -- extensions resources: -- thirdpartyresources verbs: [&quot;&quot;] -- apiGroups: -- apiextensions.k8s.io resources: -- customresourcedefinitions verbs: [&quot;&quot;] + Prometheus Support test cases Install the Prometheus Operator (include a role and service account for it). For example:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: prometheus-operator namespace: default roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus-operator subjects: - kind: ServiceAccount name: prometheus-operator namespace: default &ndash; apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus-operator namespace: default rules: - apiGroups: - extensions resources: - thirdpartyresources verbs: [&quot;&quot;] - apiGroups: - apiextensions.k8s.io resources: - customresourcedefinitions verbs: [&quot;&quot;] diff --git a/manual/pre-release/basic-operations/index.xml b/manual/pre-release/basic-operations/index.xml index b961c7c1ca..c38b1bf14f 100644 --- a/manual/pre-release/basic-operations/index.xml +++ b/manual/pre-release/basic-operations/index.xml @@ -12,22 +12,14 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/basic-operations/snapshot-while-writing-data/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/basic-operations/snapshot-while-writing-data/ - Related issue: https://github.com/longhorn/longhorn/issues/2187 -Scenario Create a kubernetes pod + pvc that mounts a Longhorn volume. Write 5 Gib into the pod using dd if=/dev/urandom of=/mnt/&lt;volume&gt; count=5000 bs=1M conv=fsync status=progress While running the above command initiate a snapshot. Verify the logs of the instance-manager using kubetail instance-manager -n longhorn-system. There should some logs related to freezing and unfreezing the filesystem. Like Froze filesystem of volume mounted ... Verify snapshot succeeded and dd operation will complete. + Related issue: https://github.com/longhorn/longhorn/issues/2187 Scenario Create a kubernetes pod + pvc that mounts a Longhorn volume. Write 5 Gib into the pod using dd if=/dev/urandom of=/mnt/&lt;volume&gt; count=5000 bs=1M conv=fsync status=progress While running the above command initiate a snapshot. Verify the logs of the instance-manager using kubetail instance-manager -n longhorn-system. There should some logs related to freezing and unfreezing the filesystem. Like Froze filesystem of volume mounted ... Verify snapshot succeeded and dd operation will complete. Storage Network Test https://longhorn.github.io/longhorn-tests/manual/pre-release/basic-operations/storage-network/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/basic-operations/storage-network/ - Related issue: https://github.com/longhorn/longhorn/issues/2285 -Test Multus version below v4.0.0 Given Set up the Longhorn environment as mentioned here -When Run Longhorn core tests on the environment. -Then All the tests should pass. -Related issue: https://github.com/longhorn/longhorn/issues/6953 -Test Multus version above v4.0.0 Given Set up the Longhorn environment as mentioned here -When Run Longhorn core tests on the environment. -Then All the tests should pass. + Related issue: https://github.com/longhorn/longhorn/issues/2285 Test Multus version below v4.0.0 Given Set up the Longhorn environment as mentioned here When Run Longhorn core tests on the environment. Then All the tests should pass. Related issue: https://github.com/longhorn/longhorn/issues/6953 Test Multus version above v4.0.0 Given Set up the Longhorn environment as mentioned here When Run Longhorn core tests on the environment. Then All the tests should pass. diff --git a/manual/pre-release/cluster-restore/index.xml b/manual/pre-release/cluster-restore/index.xml index ea6fa53745..cdd7f616cc 100644 --- a/manual/pre-release/cluster-restore/index.xml +++ b/manual/pre-release/cluster-restore/index.xml @@ -19,8 +19,7 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/cluster-restore/restore-to-an-old-cluster/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/cluster-restore/restore-to-an-old-cluster/ - Notice that the behaviors will be different if the cluster node roles are different. e.g., A cluster contains 1 dedicated master node + 3 worker node is different from a cluster contains 3 nodes which are both master and worker. This test may need to be validated for both kind of cluster. -Node creation and deletion Deploy a 3-worker-node cluster then install Longhorn system. Deploy some workloads using Longhorn volumes then write some data. + Notice that the behaviors will be different if the cluster node roles are different. e.g., A cluster contains 1 dedicated master node + 3 worker node is different from a cluster contains 3 nodes which are both master and worker. This test may need to be validated for both kind of cluster. Node creation and deletion Deploy a 3-worker-node cluster then install Longhorn system. Deploy some workloads using Longhorn volumes then write some data. diff --git a/manual/pre-release/environment/index.xml b/manual/pre-release/environment/index.xml index 52ee71700c..5b6b893ab5 100644 --- a/manual/pre-release/environment/index.xml +++ b/manual/pre-release/environment/index.xml @@ -26,16 +26,14 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/environment/rke2-cis-1.6-profile/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/environment/rke2-cis-1.6-profile/ - Related issue This test was created in response to 2292, which used CSI-1.5. However, RKE2 generally does not recommend or encourage using CIS-1.5 in favor of CIS1.6. -Scenario Prepare 1 control plane node and 3 worker nodes. Install RKE2 v1.24 with CIS-1.6 profile on 1 control plane node. sudo su - systemctl disable firewalld # On a supporting OS. systemctl stop firewalld # On a supporting OS. yum install iscsi-initiator-utils # Or the OS equivalent. + Related issue This test was created in response to 2292, which used CSI-1.5. However, RKE2 generally does not recommend or encourage using CIS-1.5 in favor of CIS1.6. Scenario Prepare 1 control plane node and 3 worker nodes. Install RKE2 v1.24 with CIS-1.6 profile on 1 control plane node. sudo su - systemctl disable firewalld # On a supporting OS. systemctl stop firewalld # On a supporting OS. yum install iscsi-initiator-utils # Or the OS equivalent. Test Longhorn deployment on RKE2 v1.25+ with CIS-1.23 profile https://longhorn.github.io/longhorn-tests/manual/pre-release/environment/rke2-cis-1.23-profile/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/environment/rke2-cis-1.23-profile/ - Related issue This is an expansion of Test Longhorn deployment on RKE2 v1.24- with CIS-1.6 profile, which was created in response to 2292. However, later versions of RKE2 only support CIS-1.23. -Scenario Prepare 1 control plane node and 3 worker nodes. Install the latest RKE2 with CIS-1.23 profile on 1 control plane node. sudo su - systemctl disable firewalld # On a supporting OS. systemctl stop firewalld # On a supporting OS. + Related issue This is an expansion of Test Longhorn deployment on RKE2 v1.24- with CIS-1.6 profile, which was created in response to 2292. However, later versions of RKE2 only support CIS-1.23. Scenario Prepare 1 control plane node and 3 worker nodes. Install the latest RKE2 with CIS-1.23 profile on 1 control plane node. sudo su - systemctl disable firewalld # On a supporting OS. systemctl stop firewalld # On a supporting OS. diff --git a/manual/pre-release/ha/index.xml b/manual/pre-release/ha/index.xml index 9e9ab27b1e..f698d971bd 100644 --- a/manual/pre-release/ha/index.xml +++ b/manual/pre-release/ha/index.xml @@ -26,10 +26,7 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/ha/ha-volume-migration/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/ha/ha-volume-migration/ - Create a migratable volume: -Deploy a migratable StorageClass. e.g., https://github.com/longhorn/longhorn/blob/master/examples/rwx/storageclass-migratable.yaml Create a PVC with access mode ReadWriteMany via this StorageClass. Attach a volume to a node and wait for volume running. Then write some data into the volume. Here I would recommend directly restoring a volume (set fromBackup in the StorageClass) and attach it instead. -Start the migration by request attaching to another node for the volume. -Trigger the following scenarios then confirm or rollback the migration: + Create a migratable volume: Deploy a migratable StorageClass. e.g., https://github.com/longhorn/longhorn/blob/master/examples/rwx/storageclass-migratable.yaml Create a PVC with access mode ReadWriteMany via this StorageClass. Attach a volume to a node and wait for volume running. Then write some data into the volume. Here I would recommend directly restoring a volume (set fromBackup in the StorageClass) and attach it instead. Start the migration by request attaching to another node for the volume. Trigger the following scenarios then confirm or rollback the migration: Replica Rebuilding diff --git a/manual/pre-release/managed-kubernetes-clusters/aks/index.xml b/manual/pre-release/managed-kubernetes-clusters/aks/index.xml index 3ffff3ac05..5e0519e906 100644 --- a/manual/pre-release/managed-kubernetes-clusters/aks/index.xml +++ b/manual/pre-release/managed-kubernetes-clusters/aks/index.xml @@ -12,23 +12,14 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/aks/expand-volume/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/aks/expand-volume/ - Create AKS cluster with 3 nodes and install Longhorn. -Create deployment and write some data to it. -In Longhorn, set replica-replenishment-wait-interval to 0. -Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool. -AKS_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; AKS_RESOURCE_GROUP=&lt;aks-resource-group&gt; AKS_CLUSTER_NAME=&lt;aks-cluster-name&gt; AKS_DISK_SIZE_NEW=&lt;new-disk-size-in-gb&gt; AKS_NODE_NUM=&lt;number-of-nodes&gt; AKS_K8S_VERSION=&lt;kubernetes-version&gt; az aks nodepool add \ --resource-group ${AKS_RESOURCE_GROUP} \ --cluster-name ${AKS_CLUSTER_NAME} \ --name ${AKS_NODEPOOL_NAME_NEW} \ --node-count ${AKS_NODE_NUM} \ --node-osdisk-size ${AKS_DISK_SIZE_NEW} \ --kubernetes-version ${AKS_K8S_VERSION} \ --mode System Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. + Create AKS cluster with 3 nodes and install Longhorn. Create deployment and write some data to it. In Longhorn, set replica-replenishment-wait-interval to 0. Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool. AKS_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; AKS_RESOURCE_GROUP=&lt;aks-resource-group&gt; AKS_CLUSTER_NAME=&lt;aks-cluster-name&gt; AKS_DISK_SIZE_NEW=&lt;new-disk-size-in-gb&gt; AKS_NODE_NUM=&lt;number-of-nodes&gt; AKS_K8S_VERSION=&lt;kubernetes-version&gt; az aks nodepool add \ --resource-group ${AKS_RESOURCE_GROUP} \ --cluster-name ${AKS_CLUSTER_NAME} \ --name ${AKS_NODEPOOL_NAME_NEW} \ --node-count ${AKS_NODE_NUM} \ --node-osdisk-size ${AKS_DISK_SIZE_NEW} \ --kubernetes-version ${AKS_K8S_VERSION} \ --mode System Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. [Upgrade K8s](https://longhorn.io/docs/1.3.0/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-aks/) https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/aks/upgrade-k8s/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/aks/upgrade-k8s/ - Create AKS cluster with 3 nodes and install Longhorn. -Create deployment and write some data to it. -In Longhorn, set replica-replenishment-wait-interval to 0. -Upgrade AKS control plane. -AKS_RESOURCE_GROUP=&lt;aks-resource-group&gt; AKS_CLUSTER_NAME=&lt;aks-cluster-name&gt; AKS_K8S_VERSION_UPGRADE=&lt;aks-k8s-version&gt; az aks upgrade \ --resource-group ${AKS_RESOURCE_GROUP} \ --name ${AKS_CLUSTER_NAME} \ --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \ --control-plane-only Add a new node-pool. -AKS_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; AKS_DISK_SIZE=&lt;disk-size-in-gb&gt; AKS_NODE_NUM=&lt;number-of-nodes&gt; az aks nodepool add \ --resource-group ${AKS_RESOURCE_GROUP} \ --cluster-name ${AKS_CLUSTER_NAME} \ --name ${AKS_NODEPOOL_NAME_NEW} \ --node-count ${AKS_NODE_NUM} \ --node-osdisk-size ${AKS_DISK_SIZE} \ --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \ --mode System Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. + Create AKS cluster with 3 nodes and install Longhorn. Create deployment and write some data to it. In Longhorn, set replica-replenishment-wait-interval to 0. Upgrade AKS control plane. AKS_RESOURCE_GROUP=&lt;aks-resource-group&gt; AKS_CLUSTER_NAME=&lt;aks-cluster-name&gt; AKS_K8S_VERSION_UPGRADE=&lt;aks-k8s-version&gt; az aks upgrade \ --resource-group ${AKS_RESOURCE_GROUP} \ --name ${AKS_CLUSTER_NAME} \ --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \ --control-plane-only Add a new node-pool. AKS_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; AKS_DISK_SIZE=&lt;disk-size-in-gb&gt; AKS_NODE_NUM=&lt;number-of-nodes&gt; az aks nodepool add \ --resource-group ${AKS_RESOURCE_GROUP} \ --cluster-name ${AKS_CLUSTER_NAME} \ --name ${AKS_NODEPOOL_NAME_NEW} \ --node-count ${AKS_NODE_NUM} \ --node-osdisk-size ${AKS_DISK_SIZE} \ --kubernetes-version ${AKS_K8S_VERSION_UPGRADE} \ --mode System Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. diff --git a/manual/pre-release/managed-kubernetes-clusters/gke/index.xml b/manual/pre-release/managed-kubernetes-clusters/gke/index.xml index 408e69ca50..f63781e7ab 100644 --- a/manual/pre-release/managed-kubernetes-clusters/gke/index.xml +++ b/manual/pre-release/managed-kubernetes-clusters/gke/index.xml @@ -12,11 +12,7 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/gke/expand-volume/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/managed-kubernetes-clusters/gke/expand-volume/ - Create GKE cluster with 3 nodes and install Longhorn. -Create deployment and write some data to it. -In Longhorn, set replica-replenishment-wait-interval to 0. -Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool. -GKE_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; GKE_REGION=&lt;gke-region&gt; GKE_CLUSTER_NAME=&lt;gke-cluster-name&gt; GKE_IMAGE_TYPE=Ubuntu GKE_MACHINE_TYPE=&lt;gcp-machine-type&gt; GKE_DISK_SIZE_NEW=&lt;new-disk-size-in-gb&gt; GKE_NODE_NUM=&lt;number-of-nodes&gt; gcloud container node-pools create ${GKE_NODEPOOL_NAME_NEW} \ --region ${GKE_REGION} \ --cluster ${GKE_CLUSTER_NAME} \ --image-type ${GKE_IMAGE_TYPE} \ --machine-type ${GKE_MACHINE_TYPE} \ --disk-size ${GKE_DISK_SIZE_NEW} \ --num-nodes ${GKE_NODE_NUM} gcloud container node-pools list \ --zone ${GKE_REGION} \ --cluster ${GKE_CLUSTER_NAME} Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. + Create GKE cluster with 3 nodes and install Longhorn. Create deployment and write some data to it. In Longhorn, set replica-replenishment-wait-interval to 0. Add a new node-pool. Later Longhorn components will be automatically deployed on the nodes in this pool. GKE_NODEPOOL_NAME_NEW=&lt;new-nodepool-name&gt; GKE_REGION=&lt;gke-region&gt; GKE_CLUSTER_NAME=&lt;gke-cluster-name&gt; GKE_IMAGE_TYPE=Ubuntu GKE_MACHINE_TYPE=&lt;gcp-machine-type&gt; GKE_DISK_SIZE_NEW=&lt;new-disk-size-in-gb&gt; GKE_NODE_NUM=&lt;number-of-nodes&gt; gcloud container node-pools create ${GKE_NODEPOOL_NAME_NEW} \ --region ${GKE_REGION} \ --cluster ${GKE_CLUSTER_NAME} \ --image-type ${GKE_IMAGE_TYPE} \ --machine-type ${GKE_MACHINE_TYPE} \ --disk-size ${GKE_DISK_SIZE_NEW} \ --num-nodes ${GKE_NODE_NUM} gcloud container node-pools list \ --zone ${GKE_REGION} \ --cluster ${GKE_CLUSTER_NAME} Using Longhorn UI to disable the disk scheduling and request eviction for nodes in the old node-pool. [Upgrade K8s](https://longhorn.io/docs/1.3.0/advanced-resources/support-managed-k8s-service/upgrade-k8s-on-gke/) diff --git a/manual/pre-release/node-not-ready/kubelet-restart/index.xml b/manual/pre-release/node-not-ready/kubelet-restart/index.xml index e600920860..2387e5dadc 100644 --- a/manual/pre-release/node-not-ready/kubelet-restart/index.xml +++ b/manual/pre-release/node-not-ready/kubelet-restart/index.xml @@ -12,8 +12,7 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/kubelet-restart/kubelet-restart-on-a-node/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/kubelet-restart/kubelet-restart-on-a-node/ - Related issues: https://github.com/longhorn/longhorn/issues/2629 -Case 1: Kubelet restart on RKE1 multi node cluster: Create a RKE1 cluster with config of 1 etcd/control plane and 3 worker nodes. Deploy Longhorn on the cluster. Deploy prometheus monitoring app on the cluster which is using Longhorn storage class or deploy a statefulSet with Longhorn volume. Write some data into the mount point and compute the md5sum. Restart the kubelet on the node where the statefulSet or Prometheus pod is running using the command sudo docker restart kubelet Observe the volume. + Related issues: https://github.com/longhorn/longhorn/issues/2629 Case 1: Kubelet restart on RKE1 multi node cluster: Create a RKE1 cluster with config of 1 etcd/control plane and 3 worker nodes. Deploy Longhorn on the cluster. Deploy prometheus monitoring app on the cluster which is using Longhorn storage class or deploy a statefulSet with Longhorn volume. Write some data into the mount point and compute the md5sum. Restart the kubelet on the node where the statefulSet or Prometheus pod is running using the command sudo docker restart kubelet Observe the volume. diff --git a/manual/pre-release/node-not-ready/node-disconnection/index.xml b/manual/pre-release/node-not-ready/node-disconnection/index.xml index a051685acb..ea25ef154b 100644 --- a/manual/pre-release/node-not-ready/node-disconnection/index.xml +++ b/manual/pre-release/node-not-ready/node-disconnection/index.xml @@ -12,11 +12,7 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-disconnection/node-disconnection/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-disconnection/node-disconnection/ - https://github.com/longhorn/longhorn/issues/1545 For disconnect node : https://github.com/longhorn/longhorn/files/4864127/network_down.sh.zip -If auto-salvage is disabled, the auto-reattachment behavior after the node disconnection depends on all replicas are in ERROR state or not. -(1) If all replicas are in ERROR state, the volume would remain in detached/faulted state if auto-salvage is disabled. -(2) If there is any healthy replica, the volume would be auto-reattached even though auto-salvage is disabled. -What makes all replicas in ERROR state? When there is data writing during the disconnection: + https://github.com/longhorn/longhorn/issues/1545 For disconnect node : https://github.com/longhorn/longhorn/files/4864127/network_down.sh.zip If auto-salvage is disabled, the auto-reattachment behavior after the node disconnection depends on all replicas are in ERROR state or not. (1) If all replicas are in ERROR state, the volume would remain in detached/faulted state if auto-salvage is disabled. (2) If there is any healthy replica, the volume would be auto-reattached even though auto-salvage is disabled. What makes all replicas in ERROR state? When there is data writing during the disconnection: diff --git a/manual/pre-release/node-not-ready/node-down/index.xml b/manual/pre-release/node-not-ready/node-down/index.xml index a4388ac175..603e70680c 100644 --- a/manual/pre-release/node-not-ready/node-down/index.xml +++ b/manual/pre-release/node-not-ready/node-down/index.xml @@ -12,13 +12,7 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/restore-volume-node-down/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/restore-volume-node-down/ - Case 1: Create a backup. -Restore the above backup. -Power off the volume attached node during the restoring. -Wait for the Longhorn node down. -Wait for the restore volume being reattached and starting restoring volume with state Degraded. -Wait for the restore complete. -Note: During the restoration process, if the engine process fails to communicate with a replica, all replicas will be marked as ERR, and the volume&rsquo;s RestoreRequired status cannot be set to false. + Case 1: Create a backup. Restore the above backup. Power off the volume attached node during the restoring. Wait for the Longhorn node down. Wait for the restore volume being reattached and starting restoring volume with state Degraded. Wait for the restore complete. Note: During the restoration process, if the engine process fails to communicate with a replica, all replicas will be marked as ERR, and the volume&rsquo;s RestoreRequired status cannot be set to false. Backing Image on a down node @@ -32,28 +26,21 @@ Note: During the restoration process, if the engine process fails to communicate https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/node-drain-deletion/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/node-drain-deletion/ - Drain with force Make sure the volumes on the drained/removed node can be detached or recovered correctly. The related issue: https://github.com/longhorn/longhorn/issues/1214 -Deploy a cluster contains 3 worker nodes N1, N2, N3. Deploy Longhorn. Create a 1-replica deployment with a 3-replica Longhorn volume. The volume is attached to N1. Write some data to the volume and get the md5sum. Force drain and remove N2, which contains one replica only. kubectl drain &lt;Node name&gt; --delete-emptydir-data=true --force=true --grace-period=-1 --ignore-daemonsets=true --timeout=&lt;Desired timeout in secs&gt; Wait for the volume Degraded. + Drain with force Make sure the volumes on the drained/removed node can be detached or recovered correctly. The related issue: https://github.com/longhorn/longhorn/issues/1214 Deploy a cluster contains 3 worker nodes N1, N2, N3. Deploy Longhorn. Create a 1-replica deployment with a 3-replica Longhorn volume. The volume is attached to N1. Write some data to the volume and get the md5sum. Force drain and remove N2, which contains one replica only. kubectl drain &lt;Node name&gt; --delete-emptydir-data=true --force=true --grace-period=-1 --ignore-daemonsets=true --timeout=&lt;Desired timeout in secs&gt; Wait for the volume Degraded. Physical node down https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/physical-node-down/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/physical-node-down/ - One physical node down should result in the state of that node change to Down. -When using with CSI driver, one node with controller (StatefulSet/Deployment) and pod down should result in Kubernetes migrate the pod to another node, and Longhorn volume should be able to be used on that node as well. Test scenarios for this are documented here. -Note: -In this case, RWX should be excluded. -Ref: https://github.com/longhorn/longhorn/issues/5900#issuecomment-1541360552 + One physical node down should result in the state of that node change to Down. When using with CSI driver, one node with controller (StatefulSet/Deployment) and pod down should result in Kubernetes migrate the pod to another node, and Longhorn volume should be able to be used on that node as well. Test scenarios for this are documented here. Note: In this case, RWX should be excluded. Ref: https://github.com/longhorn/longhorn/issues/5900#issuecomment-1541360552 Single replica node down https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/single-replica-node-down/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/node-not-ready/node-down/single-replica-node-down/ - Related Issues https://github.com/longhorn/longhorn/issues/2329 https://github.com/longhorn/longhorn/issues/2309 https://github.com/longhorn/longhorn/issues/3957 -Default Setting Automatic salvage is enabled. -Node restart/down scenario with Pod Deletion Policy When Node is Down set to default value do-nothing. Create RWO|RWX volume with replica count = 1 &amp; data locality = enabled|disabled|strict-local. For data locality = strict-local, use RWO volume to do test. Create deployment|statefulset for volume. Power down node of volume/replica. The workload pod will get stuck in the terminating state. Volume will fail to attach since volume is not ready (i. + Related Issues https://github.com/longhorn/longhorn/issues/2329 https://github.com/longhorn/longhorn/issues/2309 https://github.com/longhorn/longhorn/issues/3957 Default Setting Automatic salvage is enabled. Node restart/down scenario with Pod Deletion Policy When Node is Down set to default value do-nothing. Create RWO|RWX volume with replica count = 1 &amp; data locality = enabled|disabled|strict-local. For data locality = strict-local, use RWO volume to do test. Create deployment|statefulset for volume. Power down node of volume/replica. The workload pod will get stuck in the terminating state. Volume will fail to attach since volume is not ready (i. Test node deletion diff --git a/manual/pre-release/resiliency/index.xml b/manual/pre-release/resiliency/index.xml index 5ff9c466d0..8c5f24205b 100644 --- a/manual/pre-release/resiliency/index.xml +++ b/manual/pre-release/resiliency/index.xml @@ -12,25 +12,21 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/simulated-slow-disk/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/simulated-slow-disk/ - This case requires the creation of a slow virtual disk with dmsetup. -Make a slow disk: -Make a disk image file: truncate -s 10g slow.img Create a loopback device: losetup --show -P -f slow.img Get the block size of the loopback device: blockdev --getsize /dev/loopX Create slow device: echo &quot;0 &lt;blocksize&gt; delay /dev/loopX 0 500&quot; | dmsetup create dm-slow Format slow device: mkfs.ext4 /dev/mapper/dm-slow Mount slow device: mount /dev/mapper/dm-slow /mnt Build longhorn-engine and run it on the slow disk. + This case requires the creation of a slow virtual disk with dmsetup. Make a slow disk: Make a disk image file: truncate -s 10g slow.img Create a loopback device: losetup --show -P -f slow.img Get the block size of the loopback device: blockdev --getsize /dev/loopX Create slow device: echo &quot;0 &lt;blocksize&gt; delay /dev/loopX 0 500&quot; | dmsetup create dm-slow Format slow device: mkfs.ext4 /dev/mapper/dm-slow Mount slow device: mount /dev/mapper/dm-slow /mnt Build longhorn-engine and run it on the slow disk. PVC provisioning with insufficient storage https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/pvc_provisioning_with_insufficient_storage/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/pvc_provisioning_with_insufficient_storage/ - Related Issue: https://github.com/longhorn/longhorn/issues/4654 https://github.com/longhorn/longhorn/issues/3529 Root Cause Analysis https://github.com/longhorn/longhorn/issues/4654#issuecomment-1264870672 This case need to be tested on both RWO/RWX volumes -Create a PVC with size larger than 8589934591 GiB. Deployment keep in pending status, RWO/RWX volume will keep in a create -&gt; delete loop. Create a PVC with size &lt;= 8589934591 GiB, but greater than the actual available space size. RWO/RWX volume will be created, and volume will have annotation &ldquo;longhorn.io/volume-scheduling-error&rdquo;: &ldquo;insufficient storage volume scheduling failure&rdquo; in it. + Related Issue: https://github.com/longhorn/longhorn/issues/4654 https://github.com/longhorn/longhorn/issues/3529 Root Cause Analysis https://github.com/longhorn/longhorn/issues/4654#issuecomment-1264870672 This case need to be tested on both RWO/RWX volumes Create a PVC with size larger than 8589934591 GiB. Deployment keep in pending status, RWO/RWX volume will keep in a create -&gt; delete loop. Create a PVC with size &lt;= 8589934591 GiB, but greater than the actual available space size. RWO/RWX volume will be created, and volume will have annotation &ldquo;longhorn.io/volume-scheduling-error&rdquo;: &ldquo;insufficient storage volume scheduling failure&rdquo; in it. Test Longhorn components recovery https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/test-longhorn-component-recovery/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/resiliency/test-longhorn-component-recovery/ - This is a simple test is check if all the components are recoverable. -Test data setup: Deploy Longhorn on a 3 nodes cluster. Create a volume vol-1 using Longhorn UI. Create a volume vol-2 using the Longhorn storage class. Create a volume vol-3 with backing image. Create an RWX volume vol-4. Write some data in all the volumes created and compute the md5sum. Have all the volumes in attached state. Test steps: Delete the IM-e from every volume and make sure every volume recovers. + This is a simple test is check if all the components are recoverable. Test data setup: Deploy Longhorn on a 3 nodes cluster. Create a volume vol-1 using Longhorn UI. Create a volume vol-2 using the Longhorn storage class. Create a volume vol-3 with backing image. Create an RWX volume vol-4. Write some data in all the volumes created and compute the md5sum. Have all the volumes in attached state. Test steps: Delete the IM-e from every volume and make sure every volume recovers. Test timeout on loss of network connectivity diff --git a/manual/pre-release/upgrade/index.xml b/manual/pre-release/upgrade/index.xml index f511614a30..57285fab79 100644 --- a/manual/pre-release/upgrade/index.xml +++ b/manual/pre-release/upgrade/index.xml @@ -12,25 +12,21 @@ https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/auto-upgrade-engine/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/auto-upgrade-engine/ - Longhorn version >= 1.1.1 Reference ticket 2152 -Test basic upgrade Install old Longhorn version. E.g., &lt;= v1.0.2 Create a volume, attach it to a pod, write some data. Create a DR volume and leave it in the detached state. Upgrade to Longhorn master Set setting concurrent automatic engine upgrade per node limit to 3 Verify that volumes&rsquo; engines are upgraded automatically. Test concurrent upgrade Create a StatefulSet of scale 10 using 10 Longhorn volume. + Longhorn version >= 1.1.1 Reference ticket 2152 Test basic upgrade Install old Longhorn version. E.g., &lt;= v1.0.2 Create a volume, attach it to a pod, write some data. Create a DR volume and leave it in the detached state. Upgrade to Longhorn master Set setting concurrent automatic engine upgrade per node limit to 3 Verify that volumes&rsquo; engines are upgraded automatically. Test concurrent upgrade Create a StatefulSet of scale 10 using 10 Longhorn volume. Kubernetes upgrade test https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/kubernetes-upgrade-test/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/kubernetes-upgrade-test/ - We also need to cover the Kubernetes upgrade process for supported Kubernetes version, make sure pod and volumes works after a major version upgrade. -Related Issue https://github.com/longhorn/longhorn/issues/2566 -Test with K8s upgrade Create a K8s (Immediate prior version) cluster with 3 worker nodes and 1 control plane. Deploy Longhorn version (Immediate prior version) on the cluster. Create a volume and attach to a pod. Write data to the volume and compute the checksum. + We also need to cover the Kubernetes upgrade process for supported Kubernetes version, make sure pod and volumes works after a major version upgrade. Related Issue https://github.com/longhorn/longhorn/issues/2566 Test with K8s upgrade Create a K8s (Immediate prior version) cluster with 3 worker nodes and 1 control plane. Deploy Longhorn version (Immediate prior version) on the cluster. Create a volume and attach to a pod. Write data to the volume and compute the checksum. Longhorn Upgrade test https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/longhorn-upgrade-test/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/longhorn-upgrade-test/ - Setup 2 attached volumes with data. 2 detached volumes with data. 2 new volumes without data. 2 deployments of one pod. 1 statefulset of 10 pods. Auto Salvage set to disable. Test After upgrade: -Make sure the existing instance managers didn&rsquo;t restart. Make sure pods didn&rsquo;t restart. Check the contents of the volumes. If the Engine API version is incompatible, manager cannot do anything about the attached volumes except detaching it. + Setup 2 attached volumes with data. 2 detached volumes with data. 2 new volumes without data. 2 deployments of one pod. 1 statefulset of 10 pods. Auto Salvage set to disable. Test After upgrade: Make sure the existing instance managers didn&rsquo;t restart. Make sure pods didn&rsquo;t restart. Check the contents of the volumes. If the Engine API version is incompatible, manager cannot do anything about the attached volumes except detaching it. Re-deploy CSI components when their images change @@ -58,10 +54,7 @@ Make sure the existing instance managers didn&rsquo;t restart. Make sure pod https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/test-node-drain-policy/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/pre-release/upgrade/test-node-drain-policy/ - With node-drain-policy is block-if-contains-last-replica Note: Starting from v1.5.x, it is not necessary to check for the presence of longhorn-admission-webhook and longhorn-conversion-webhook. Please refer to the Longhorn issue #5590 for more details. -Starting from v1.5.x, observe that the instance-manager-r and instance-manager-e are combined into instance-manager. Ref 5208 -1. Basic unit tests 1.1 Single worker node cluster with separate master node 1.1.1 RWO volumes -Deploy Longhorn Verify that there is no PDB for csi-attacher, csi-provisioner, longhorn-admission-webhook, and longhorn-conversion-webhook Manually create a PVC (simulate the volume which has never been attached scenario) Verify that there is no PDB for csi-attacher, csi-provisioner, longhorn-admission-webhook, and longhorn-conversion-webhook because there is no attached volume Create a deployment that uses one RW0 Longhorn volume. + With node-drain-policy is block-if-contains-last-replica Note: Starting from v1.5.x, it is not necessary to check for the presence of longhorn-admission-webhook and longhorn-conversion-webhook. Please refer to the Longhorn issue #5590 for more details. Starting from v1.5.x, observe that the instance-manager-r and instance-manager-e are combined into instance-manager. Ref 5208 1. Basic unit tests 1.1 Single worker node cluster with separate master node 1.1.1 RWO volumes Deploy Longhorn Verify that there is no PDB for csi-attacher, csi-provisioner, longhorn-admission-webhook, and longhorn-conversion-webhook Manually create a PVC (simulate the volume which has never been attached scenario) Verify that there is no PDB for csi-attacher, csi-provisioner, longhorn-admission-webhook, and longhorn-conversion-webhook because there is no attached volume Create a deployment that uses one RW0 Longhorn volume. Test system upgrade with a new storage class being default diff --git a/manual/rancher-integration/index.xml b/manual/rancher-integration/index.xml index 906bcb478b..d1c2a3c8fa 100644 --- a/manual/rancher-integration/index.xml +++ b/manual/rancher-integration/index.xml @@ -12,74 +12,42 @@ https://longhorn.github.io/longhorn-tests/manual/rancher-integration/access-lh-gui-using-rancher-ui/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/access-lh-gui-using-rancher-ui/ - Given Downstream (RKE2/RKE1/K3s) cluster in Rancher -AND Deploy Longhorn using either of Kubectl/helm/marketplace app -When Click the Longhorn app on Rancher UI -Then Navigates to Longhorn UI -AND User should be to do all the operations available on the Longhorn GUI -AND URL should be a suffix to the Rancher URL -AND NO error in the console logs + Given Downstream (RKE2/RKE1/K3s) cluster in Rancher AND Deploy Longhorn using either of Kubectl/helm/marketplace app When Click the Longhorn app on Rancher UI Then Navigates to Longhorn UI AND User should be to do all the operations available on the Longhorn GUI AND URL should be a suffix to the Rancher URL AND NO error in the console logs Drain using Rancher UI https://longhorn.github.io/longhorn-tests/manual/rancher-integration/drain-using-rancher-ui/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/drain-using-rancher-ui/ - Note: Enabling Delete Empty Dir Data is mandatory to drain a node if a pod is associated with any storage. -Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Drain operation on single node using Rancher UI Given Single node (1 Worker) cluster with Longhorn installed -AND few RWO and RWX volumes attached with node/pod exists -AND 1 RWO and 1 RWX volumes unattached -When Drain the node with default values of Rancher UI + Note: Enabling Delete Empty Dir Data is mandatory to drain a node if a pod is associated with any storage. Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Drain operation on single node using Rancher UI Given Single node (1 Worker) cluster with Longhorn installed AND few RWO and RWX volumes attached with node/pod exists AND 1 RWO and 1 RWX volumes unattached When Drain the node with default values of Rancher UI Longhorn in an hardened cluster https://longhorn.github.io/longhorn-tests/manual/rancher-integration/lh-hardend-rancher/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/lh-hardend-rancher/ - Given Hardened Downstream (RKE2/RKE1/K3s) cluster in Rancher v2.6.x with CIS 1.6 as per Hardening guide -When Deploy Longhorn using Marketplace app -Then Longhorn should be deployed properly -AND Volume creation and other operations should work fine -Given Hardened Downstream (RKE2/RKE1/K3s) cluster in Rancher v2.7.x with CIS 1.6 or 1.20 or 1.23 as per Hardending guide for Rancher 2.7 -When Deploy Longhorn using Marketplace app -Then Longhorn should be deployed properly + Given Hardened Downstream (RKE2/RKE1/K3s) cluster in Rancher v2.6.x with CIS 1.6 as per Hardening guide When Deploy Longhorn using Marketplace app Then Longhorn should be deployed properly AND Volume creation and other operations should work fine Given Hardened Downstream (RKE2/RKE1/K3s) cluster in Rancher v2.7.x with CIS 1.6 or 1.20 or 1.23 as per Hardending guide for Rancher 2.7 When Deploy Longhorn using Marketplace app Then Longhorn should be deployed properly Longhorn using fleet on multiple downstream clusters https://longhorn.github.io/longhorn-tests/manual/rancher-integration/fleet-deploy/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/fleet-deploy/ - reference: https://github.com/rancher/fleet -Test Longhorn deployment using fleet: Given Downstream multiple (RKE2/RKE1/K3s) clusters in Rancher -When Use fleet to deploy Longhorn -Then Longhorn should be deployed to all the cluster -AND Longhorn UI should be accessible using Rancher proxy -Test Longhorn uninstall using fleet: Given Downstream multiple (RKE2/RKE1/K3s) clusters in Rancher -AND Longhorn is deployed on all the clusters using fleet -When Use fleet to uninstall Longhorn -Then Longhorn should be uninstalled from all the cluster + reference: https://github.com/rancher/fleet Test Longhorn deployment using fleet: Given Downstream multiple (RKE2/RKE1/K3s) clusters in Rancher When Use fleet to deploy Longhorn Then Longhorn should be deployed to all the cluster AND Longhorn UI should be accessible using Rancher proxy Test Longhorn uninstall using fleet: Given Downstream multiple (RKE2/RKE1/K3s) clusters in Rancher AND Longhorn is deployed on all the clusters using fleet When Use fleet to uninstall Longhorn Then Longhorn should be uninstalled from all the cluster Upgrade Kubernetes using Rancher UI https://longhorn.github.io/longhorn-tests/manual/rancher-integration/upgrade-using-rancher-ui/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/upgrade-using-rancher-ui/ - Note: Longhorn version v1.3.x doesn&rsquo;t support Kubernetes v1.25 and onwards -Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Upgrade single node cluster using Rancher UI - RKE2 cluster Given Single node RKE2 cluster provisioned in Rancher with K8s prior version with Longhorn installed -AND few RWO and RWX volumes attached with node/pod exists -AND 1 RWO and 1 RWX volumes unattached -AND 1 RWO volume with 50 Gi data + Note: Longhorn version v1.3.x doesn&rsquo;t support Kubernetes v1.25 and onwards Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Upgrade single node cluster using Rancher UI - RKE2 cluster Given Single node RKE2 cluster provisioned in Rancher with K8s prior version with Longhorn installed AND few RWO and RWX volumes attached with node/pod exists AND 1 RWO and 1 RWX volumes unattached AND 1 RWO volume with 50 Gi data Upgrade Kubernetes using SUC https://longhorn.github.io/longhorn-tests/manual/rancher-integration/upgrade-using-suc/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/rancher-integration/upgrade-using-suc/ - Note: Longhorn version v1.3.x doesn&rsquo;t support Kubernetes v1.25 and onwards -Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Upgrade multi node cluster using SUC - K3s cluster Given Multi node (1 master and 3 worker) K3s cluster (not provisioned by Rancher) with K3s prior version with Longhorn installed -AND System Upgrade Controller deployed -AND few RWO and RWX volumes attached with node/pod exists -AND 1 RWO and 1 RWX volumes unattached + Note: Longhorn version v1.3.x doesn&rsquo;t support Kubernetes v1.25 and onwards Test with Longhorn default setting of &lsquo;Node Drain Policy&rsquo;: block-if-contains-last-replica 1. Upgrade multi node cluster using SUC - K3s cluster Given Multi node (1 master and 3 worker) K3s cluster (not provisioned by Rancher) with K3s prior version with Longhorn installed AND System Upgrade Controller deployed AND few RWO and RWX volumes attached with node/pod exists AND 1 RWO and 1 RWX volumes unattached diff --git a/manual/release-specific/v1.0.1/index.xml b/manual/release-specific/v1.0.1/index.xml index 04277427a5..d23d6cdf01 100644 --- a/manual/release-specific/v1.0.1/index.xml +++ b/manual/release-specific/v1.0.1/index.xml @@ -26,8 +26,7 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/dr-volume-latest-backup-deletion/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/dr-volume-latest-backup-deletion/ - DR volume keeps getting the latest update from the related backups. Edge cases where the latest backup is deleted can be test as below. -Case 1: Create a volume and take multiple backups for the same. Delete the latest backup. Create another cluster and set the same backup store to access the backups created in step 1. Go to backup page and click on the backup. Verify the Create Disaster Recovery option is enabled for it. + DR volume keeps getting the latest update from the related backups. Edge cases where the latest backup is deleted can be test as below. Case 1: Create a volume and take multiple backups for the same. Delete the latest backup. Create another cluster and set the same backup store to access the backups created in step 1. Go to backup page and click on the backup. Verify the Create Disaster Recovery option is enabled for it. NFSv4 Enforcement (No NFSv3 Fallback) @@ -41,16 +40,14 @@ Case 1: Create a volume and take multiple backups for the same. Delete the lates https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/priorityclass-default-setting/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/priorityclass-default-setting/ - There are three different cases we need to test when the user inputs a default setting for Priority Class: -Install Longhorn with no priority-class set in the default settings. The Priority Class setting should be empty after the installation completes according to the longhorn-ui, and the default Priority of all Pods in the longhorn-system namespace should be 0: ~ kubectl -n longhorn-system describe pods | grep Priority # should be repeated many times Priority: 0 Install Longhorn with a nonexistent priority-class in the default settings. + There are three different cases we need to test when the user inputs a default setting for Priority Class: Install Longhorn with no priority-class set in the default settings. The Priority Class setting should be empty after the installation completes according to the longhorn-ui, and the default Priority of all Pods in the longhorn-system namespace should be 0: ~ kubectl -n longhorn-system describe pods | grep Priority # should be repeated many times Priority: 0 Install Longhorn with a nonexistent priority-class in the default settings. Return an error when fail to remount a volume https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/error-fail-remount/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/error-fail-remount/ - Case 1: Volume with a corrupted filesystem try to remount Steps to reproduce bug: -Create a volume of size 1GB, say terminate-immediatly volume. Create PV/PVC from the volume terminate-immediatly Create a deployment of 1 pod with image ubuntu:xenial and the PVC terminate-immediatly in default namespace Find the node on which the pod is scheduled to. Let&rsquo;s say the node is Node-1 ssh into Node-1 destroy the filesystem of terminate-immediatly by running command dd if=/dev/zero of=/dev/longhorn/terminate-immediatly Find and kill the engine instance manager in Node-X. + Case 1: Volume with a corrupted filesystem try to remount Steps to reproduce bug: Create a volume of size 1GB, say terminate-immediatly volume. Create PV/PVC from the volume terminate-immediatly Create a deployment of 1 pod with image ubuntu:xenial and the PVC terminate-immediatly in default namespace Find the node on which the pod is scheduled to. Let&rsquo;s say the node is Node-1 ssh into Node-1 destroy the filesystem of terminate-immediatly by running command dd if=/dev/zero of=/dev/longhorn/terminate-immediatly Find and kill the engine instance manager in Node-X. Test access style for S3 compatible backupstore @@ -64,18 +61,14 @@ Create a volume of size 1GB, say terminate-immediatly volume. Create PV/PVC from https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/test-s3-backupstore/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/test-s3-backupstore/ - Related issue: 3136 -Requirement: -Set up a stand alone Squid, HTTP web proxy To configure Squid proxy: a comment about squid config If setting up instance on AWS: a EC2 security group setting S3 with existing backups Steps: -Create credential for Backup Target $ secret_name=&#34;aws-secret-proxy&#34; $ proxy_ip=123.123.123.123 $ no_proxy_params=&#34;localhost,127.0.0.1,0.0.0.0,10.0.0.0/8,192.168.0.0/16&#34; $ kubectl create secret generic $secret_name \ --from-literal=AWS_ACCESS_KEY_ID=$AWS_ID \ --from-literal=AWS_SECRET_ACCESS_KEY=$AWS_KEY \ --from-literal=HTTP_PROXY=$proxy_ip:3128 \ --from-literal=HTTPS_PROXY=$proxy_ip:3128 \ --from-literal=NO_PROXY=$no_proxy_params \ -n longhorn-system Open Longhorn UI Click on Setting Scroll down to Backup Target Credential Secret Fill in $secret_name assigned in step 1. + Related issue: 3136 Requirement: Set up a stand alone Squid, HTTP web proxy To configure Squid proxy: a comment about squid config If setting up instance on AWS: a EC2 security group setting S3 with existing backups Steps: Create credential for Backup Target $ secret_name=&#34;aws-secret-proxy&#34; $ proxy_ip=123.123.123.123 $ no_proxy_params=&#34;localhost,127.0.0.1,0.0.0.0,10.0.0.0/8,192.168.0.0/16&#34; $ kubectl create secret generic $secret_name \ --from-literal=AWS_ACCESS_KEY_ID=$AWS_ID \ --from-literal=AWS_SECRET_ACCESS_KEY=$AWS_KEY \ --from-literal=HTTP_PROXY=$proxy_ip:3128 \ --from-literal=HTTPS_PROXY=$proxy_ip:3128 \ --from-literal=NO_PROXY=$no_proxy_params \ -n longhorn-system Open Longhorn UI Click on Setting Scroll down to Backup Target Credential Secret Fill in $secret_name assigned in step 1. Volume Deletion UI Warnings https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/ui-volume-deletion/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.1/ui-volume-deletion/ - A number of cases need to be manually tested in longhorn-ui. To test these cases, create the Volume with the specified conditions in each case, and then try to delete it. What is observed should match what is described in the test case: -A regular Volume. Only the default deletion prompt should show up asking to confirm deletion. A Volume with a Persistent Volume. The deletion prompt should tell the user that there is a Persistent Volume that will be deleted along with the Volume. + A number of cases need to be manually tested in longhorn-ui. To test these cases, create the Volume with the specified conditions in each case, and then try to delete it. What is observed should match what is described in the test case: A regular Volume. Only the default deletion prompt should show up asking to confirm deletion. A Volume with a Persistent Volume. The deletion prompt should tell the user that there is a Persistent Volume that will be deleted along with the Volume. diff --git a/manual/release-specific/v1.0.2/index.xml b/manual/release-specific/v1.0.2/index.xml index 93eb6ec0bf..144db9f3ef 100644 --- a/manual/release-specific/v1.0.2/index.xml +++ b/manual/release-specific/v1.0.2/index.xml @@ -12,8 +12,7 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.2/upgrade-lease-lock/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.0.2/upgrade-lease-lock/ - The time it takes between the Longhorn Manager starting up and the upgrade completing for that Longhorn Manager can be used to determine if the upgrade lock was released correctly: -Create a fresh Longhorn installation or delete all of the Longhorn Manager Pods in the existing installation. Check the logs for the Longhorn Manager Pods and note the timestamps for the first line in the log and the timestamp for when the upgrade has completed. + The time it takes between the Longhorn Manager starting up and the upgrade completing for that Longhorn Manager can be used to determine if the upgrade lock was released correctly: Create a fresh Longhorn installation or delete all of the Longhorn Manager Pods in the existing installation. Check the logs for the Longhorn Manager Pods and note the timestamps for the first line in the log and the timestamp for when the upgrade has completed. diff --git a/manual/release-specific/v1.1.0/index.xml b/manual/release-specific/v1.1.0/index.xml index faa5b7540d..497ab26ff3 100644 --- a/manual/release-specific/v1.1.0/index.xml +++ b/manual/release-specific/v1.1.0/index.xml @@ -12,16 +12,14 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/prometheus_support/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/prometheus_support/ - Prometheus Support allows user to monitor the longhorn metrics. The details are available at https://longhorn.io/docs/1.1.0/monitoring/ -Monitor longhorn Deploy the Prometheus-operator, ServiceMonitor pointing to longhorn-backend and Prometheus as mentioned in the doc. Create an ingress pointing to Prometheus service. Access the Prometheus web UI using the ingress created in the step 2. Select the metrics from below to monitor the longhorn resources. longhorn_volume_actual_size_bytes longhorn_volume_capacity_bytes longhorn_volume_robustness longhorn_volume_state longhorn_instance_manager_cpu_requests_millicpu longhorn_instance_manager_cpu_usage_millicpu longhorn_instance_manager_memory_requests_bytes longhorn_instance_manager_memory_usage_bytes longhorn_manager_cpu_usage_millicpu longhorn_manager_memory_usage_bytes longhorn_node_count_total longhorn_node_status longhorn_node_cpu_capacity_millicpu longhorn_node_cpu_usage_millicpu longhorn_node_memory_capacity_bytes longhorn_node_memory_usage_bytes longhorn_node_storage_capacity_bytes longhorn_node_storage_reservation_bytes longhorn_node_storage_usage_bytes longhorn_disk_capacity_bytes longhorn_disk_reservation_bytes longhorn_disk_usage_bytes Deploy workloads which use Longhorn volumes into the cluster. + Prometheus Support allows user to monitor the longhorn metrics. The details are available at https://longhorn.io/docs/1.1.0/monitoring/ Monitor longhorn Deploy the Prometheus-operator, ServiceMonitor pointing to longhorn-backend and Prometheus as mentioned in the doc. Create an ingress pointing to Prometheus service. Access the Prometheus web UI using the ingress created in the step 2. Select the metrics from below to monitor the longhorn resources. longhorn_volume_actual_size_bytes longhorn_volume_capacity_bytes longhorn_volume_robustness longhorn_volume_state longhorn_instance_manager_cpu_requests_millicpu longhorn_instance_manager_cpu_usage_millicpu longhorn_instance_manager_memory_requests_bytes longhorn_instance_manager_memory_usage_bytes longhorn_manager_cpu_usage_millicpu longhorn_manager_memory_usage_bytes longhorn_node_count_total longhorn_node_status longhorn_node_cpu_capacity_millicpu longhorn_node_cpu_usage_millicpu longhorn_node_memory_capacity_bytes longhorn_node_memory_usage_bytes longhorn_node_storage_capacity_bytes longhorn_node_storage_reservation_bytes longhorn_node_storage_usage_bytes longhorn_disk_capacity_bytes longhorn_disk_reservation_bytes longhorn_disk_usage_bytes Deploy workloads which use Longhorn volumes into the cluster. Recurring backup job interruptions https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/recurring-backup-job-interruptions/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/recurring-backup-job-interruptions/ - Related Issue https://github.com/longhorn/longhorn/issues/1882 -Scenario 1- Allow Recurring Job While Volume Is Detached disabled, attached pod scaled down while the recurring backup was in progress. Create a volume, attach to a pod of a statefulSet, and write 800 Mi data into it. Set a recurring job. While the recurring job is in progress, scale down the pod to 0 of the statefulSet. Volume first detached and cron job gets finished saying unable to complete the backup. + Related Issue https://github.com/longhorn/longhorn/issues/1882 Scenario 1- Allow Recurring Job While Volume Is Detached disabled, attached pod scaled down while the recurring backup was in progress. Create a volume, attach to a pod of a statefulSet, and write 800 Mi data into it. Set a recurring job. While the recurring job is in progress, scale down the pod to 0 of the statefulSet. Volume first detached and cron job gets finished saying unable to complete the backup. Reusing failed replica for rebuilding @@ -35,17 +33,14 @@ Scenario 1- Allow Recurring Job While Volume Is Detached disabled, attached pod https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/kubelet_volume_metrics/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/kubelet_volume_metrics/ - Intro Kubelet exposes kubelet_volume_stats_* metrics. Those metrics measure PVC&rsquo;s filesystem related information inside a Longhorn block device. -Test steps: Create a cluster and set up this monitoring system: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack Install Longhorn. Deploy some workloads using Longhorn volumes. Make sure there are some workloads using Longhorn PVCs in volumeMode: Block and some workloads using Longhorn PVCs in volumeMode: Filesystem. See https://longhorn.io/docs/1.0.2/references/examples/ for examples. Create ingress to Prometheus server and Grafana. Navigate to Prometheus server, verify that all Longhorn PVCs in volumeMode: Filesystem show up in metrics: kubelet_volume_stats_capacity_bytes kubelet_volume_stats_available_bytes kubelet_volume_stats_used_bytes kubelet_volume_stats_inodes kubelet_volume_stats_inodes_free kubelet_volume_stats_inodes_used. + Intro Kubelet exposes kubelet_volume_stats_* metrics. Those metrics measure PVC&rsquo;s filesystem related information inside a Longhorn block device. Test steps: Create a cluster and set up this monitoring system: https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack Install Longhorn. Deploy some workloads using Longhorn volumes. Make sure there are some workloads using Longhorn PVCs in volumeMode: Block and some workloads using Longhorn PVCs in volumeMode: Filesystem. See https://longhorn.io/docs/1.0.2/references/examples/ for examples. Create ingress to Prometheus server and Grafana. Navigate to Prometheus server, verify that all Longhorn PVCs in volumeMode: Filesystem show up in metrics: kubelet_volume_stats_capacity_bytes kubelet_volume_stats_available_bytes kubelet_volume_stats_used_bytes kubelet_volume_stats_inodes kubelet_volume_stats_inodes_free kubelet_volume_stats_inodes_used. Test Additional Printer Columns https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/additional-printer-columns/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/additional-printer-columns/ - For each of the case below: -Fresh installation of Longhorn. (make sure to delete all Longhorn CRDs before installation) Upgrade from older version. Run: -kubectl get &lt;LONGHORN-CRD&gt; -n longhorn-system Verify that the output contains information as specify in the additionalPrinerColumns at here + For each of the case below: Fresh installation of Longhorn. (make sure to delete all Longhorn CRDs before installation) Upgrade from older version. Run: kubectl get &lt;LONGHORN-CRD&gt; -n longhorn-system Verify that the output contains information as specify in the additionalPrinerColumns at here Test Instance Manager IP Sync @@ -59,9 +54,7 @@ kubectl get &lt;LONGHORN-CRD&gt; -n longhorn-system Verify that the outp https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/iscsi_installation/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/iscsi_installation/ - This is for EKS or similar users who doesn&rsquo;t need to log into each host to install &lsquo;ISCSI&rsquo; individually. -Test steps: -Create an EKS cluster with 3 nodes. Run the following command to install iscsi on every nodes. kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/iscsi/longhorn-iscsi-installation.yaml In Longhorn Manager Repo Directory run: kubectl apply -Rf ./deploy/install/ Longhorn should be able installed successfully. Try to create a pod with a pvc: kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/examples/simple_pvc.yaml kubectl apply -f https://raw. + This is for EKS or similar users who doesn&rsquo;t need to log into each host to install &lsquo;ISCSI&rsquo; individually. Test steps: Create an EKS cluster with 3 nodes. Run the following command to install iscsi on every nodes. kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/iscsi/longhorn-iscsi-installation.yaml In Longhorn Manager Repo Directory run: kubectl apply -Rf ./deploy/install/ Longhorn should be able installed successfully. Try to create a pod with a pvc: kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/examples/simple_pvc.yaml kubectl apply -f https://raw. Test Read Write Many Feature @@ -75,18 +68,14 @@ Create an EKS cluster with 3 nodes. Run the following command to install iscsi o https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/uninstallation/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/uninstallation/ - Stability of uninstallation Launch Longhorn system. -Use scripts to continuously create then delete multiple DaemonSets. -e.g., putting the following python test into the manager integration test directory and run it: from common import get_apps_api_client # NOQA def test_uninstall_script(): apps_api = get_apps_api_client() while True: for i in range(10): name = &#34;ds-&#34; + str(i) try: ds = apps_api.read_namespaced_daemon_set(name, &#34;default&#34;) if ds.status.number_ready == ds.status.number_ready: apps_api.delete_namespaced_daemon_set(name, &#34;default&#34;) except Exception: apps_api.create_namespaced_daemon_set( &#34;default&#34;, ds_manifest(name)) def ds_manifest(name): return { &#39;apiVersion&#39;: &#39;apps/v1&#39;, &#39;kind&#39;: &#39;DaemonSet&#39;, &#39;metadata&#39;: { &#39;name&#39;: name }, &#39;spec&#39;: { &#39;selector&#39;: { &#39;matchLabels&#39;: { &#39;app&#39;: name } }, &#39;template&#39;: { &#39;metadata&#39;: { &#39;labels&#39;: { &#39;app&#39;: name } }, &#39;spec&#39;: { &#39;terminationGracePeriodSeconds&#39;: 10, &#39;containers&#39;: [{ &#39;image&#39;: &#39;busybox&#39;, &#39;imagePullPolicy&#39;: &#39;IfNotPresent&#39;, &#39;name&#39;: &#39;sleep&#39;, &#39;args&#39;: [ &#39;/bin/sh&#39;, &#39;-c&#39;, &#39;while true;do date;sleep 5; done&#39; ], }] } }, } } Start to uninstall longhorn. + Stability of uninstallation Launch Longhorn system. Use scripts to continuously create then delete multiple DaemonSets. e.g., putting the following python test into the manager integration test directory and run it: from common import get_apps_api_client # NOQA def test_uninstall_script(): apps_api = get_apps_api_client() while True: for i in range(10): name = &#34;ds-&#34; + str(i) try: ds = apps_api.read_namespaced_daemon_set(name, &#34;default&#34;) if ds.status.number_ready == ds.status.number_ready: apps_api.delete_namespaced_daemon_set(name, &#34;default&#34;) except Exception: apps_api.create_namespaced_daemon_set( &#34;default&#34;, ds_manifest(name)) def ds_manifest(name): return { &#39;apiVersion&#39;: &#39;apps/v1&#39;, &#39;kind&#39;: &#39;DaemonSet&#39;, &#39;metadata&#39;: { &#39;name&#39;: name }, &#39;spec&#39;: { &#39;selector&#39;: { &#39;matchLabels&#39;: { &#39;app&#39;: name } }, &#39;template&#39;: { &#39;metadata&#39;: { &#39;labels&#39;: { &#39;app&#39;: name } }, &#39;spec&#39;: { &#39;terminationGracePeriodSeconds&#39;: 10, &#39;containers&#39;: [{ &#39;image&#39;: &#39;busybox&#39;, &#39;imagePullPolicy&#39;: &#39;IfNotPresent&#39;, &#39;name&#39;: &#39;sleep&#39;, &#39;args&#39;: [ &#39;/bin/sh&#39;, &#39;-c&#39;, &#39;while true;do date;sleep 5; done&#39; ], }] } }, } } Start to uninstall longhorn. Upgrade Longhorn with modified Storage Class https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/upgrade_with_modified_storageclass/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.0/upgrade_with_modified_storageclass/ - Intro Longhorn can be upgraded with modified Storage Class. -Related Issue https://github.com/longhorn/longhorn/issues/1527 -Test steps: Kubectl apply -f Install Longhorn v1.0.2 kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.0.2/deploy/longhorn.yaml Create a statefulset using longhorn storageclass for PVCs. Set the scale to 1. Observe that there is a workload pod (pod-1) is using 1 volume (vol-1) with 3 replicas. In Longhorn repo, on master branch. Modify numberOfReplicas: &quot;1&quot; in https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml. Upgrade Longhorn to master by running kubectl apply -f https://raw. + Intro Longhorn can be upgraded with modified Storage Class. Related Issue https://github.com/longhorn/longhorn/issues/1527 Test steps: Kubectl apply -f Install Longhorn v1.0.2 kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.0.2/deploy/longhorn.yaml Create a statefulset using longhorn storageclass for PVCs. Set the scale to 1. Observe that there is a workload pod (pod-1) is using 1 volume (vol-1) with 3 replicas. In Longhorn repo, on master branch. Modify numberOfReplicas: &quot;1&quot; in https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml. Upgrade Longhorn to master by running kubectl apply -f https://raw. diff --git a/manual/release-specific/v1.1.1/index.xml b/manual/release-specific/v1.1.1/index.xml index 15fe660bf3..a3c03b8a18 100644 --- a/manual/release-specific/v1.1.1/index.xml +++ b/manual/release-specific/v1.1.1/index.xml @@ -12,87 +12,63 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/csi-sanity-check/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/csi-sanity-check/ - Related issue https://github.com/longhorn/longhorn/issues/2076 -Run csi-sanity Prepare Longhorn cluster and setup backup target. -Make csi-sanity binary from csi-test. -On one of the cluster node, run csi-sanity binary. -csi-sanity -csi.endpoint /var/lib/kubelet/obsoleted-longhorn-plugins/driver.longhorn.io/csi.sock -ginkgo.skip=&#34;should create volume from an existing source snapshot|should return appropriate values|should succeed when creating snapshot with maximum-length name|should succeed when requesting to create a snapshot with already existing name and same source volume ID|should fail when requesting to create a snapshot with already existing name and different source volume ID&#34; NOTE + Related issue https://github.com/longhorn/longhorn/issues/2076 Run csi-sanity Prepare Longhorn cluster and setup backup target. Make csi-sanity binary from csi-test. On one of the cluster node, run csi-sanity binary. csi-sanity -csi.endpoint /var/lib/kubelet/obsoleted-longhorn-plugins/driver.longhorn.io/csi.sock -ginkgo.skip=&#34;should create volume from an existing source snapshot|should return appropriate values|should succeed when creating snapshot with maximum-length name|should succeed when requesting to create a snapshot with already existing name and same source volume ID|should fail when requesting to create a snapshot with already existing name and different source volume ID&#34; NOTE Longhorn with engine is not deployed on all the nodes https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/partial-engine-deployment/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/partial-engine-deployment/ - Related Issue https://github.com/longhorn/longhorn/issues/2081 -Scenarios: Case 1: Test volume operations when some of the engine image DaemonSet pods are miss scheduled Install Longhorn in a 3-node cluster: node-1, node-2, node-3 Create a volume, vol-1, of 3 replicas Create another volume, vol-2, of 3 replicas Taint node-1 with the taint: key=value:NoSchedule Check that all functions (attach, detach, snapshot, backup, expand, restore, creating DR volume, &hellip; ) are working ok for vol-1 Case 2: Test volume operations when some of engine image DaemonSet pods are not fully deployed Continue from case 1 Attach vol-1 to node-1. + Related Issue https://github.com/longhorn/longhorn/issues/2081 Scenarios: Case 1: Test volume operations when some of the engine image DaemonSet pods are miss scheduled Install Longhorn in a 3-node cluster: node-1, node-2, node-3 Create a volume, vol-1, of 3 replicas Create another volume, vol-2, of 3 replicas Taint node-1 with the taint: key=value:NoSchedule Check that all functions (attach, detach, snapshot, backup, expand, restore, creating DR volume, &hellip; ) are working ok for vol-1 Case 2: Test volume operations when some of engine image DaemonSet pods are not fully deployed Continue from case 1 Attach vol-1 to node-1. Set Tolerations/PriorityClass For System Components https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/tolerations_priorityclass_setting/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/tolerations_priorityclass_setting/ - Related issue https://github.com/longhorn/longhorn/issues/2120 -Manual Tests: -Case 1: Existing Longhorn installation Install Longhorn master. Change toleration in UI setting Verify that longhorn.io/last-applied-tolerations annotation and toleration of manager, drive deployer, UI are not changed. Verify that longhorn.io/last-applied-tolerations annotation and toleration for managed components (CSI components, IM pods, share manager pod, EI daemonset, backing-image-manager, cronjob) are updated correctly Case 2: New installation by Helm Install Longhorn master, set tolerations like: defaultSettings: taintToleration: &#34;key=value:NoSchedule&#34; longhornManager: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule longhornDriver: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule longhornUI: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule Verify that the toleration is added for: IM pods, Share Manager pods, CSI deployments, CSI daemonset, the backup jobs, manager, drive deployer, UI Uninstall the Helm release. + Related issue https://github.com/longhorn/longhorn/issues/2120 Manual Tests: Case 1: Existing Longhorn installation Install Longhorn master. Change toleration in UI setting Verify that longhorn.io/last-applied-tolerations annotation and toleration of manager, drive deployer, UI are not changed. Verify that longhorn.io/last-applied-tolerations annotation and toleration for managed components (CSI components, IM pods, share manager pod, EI daemonset, backing-image-manager, cronjob) are updated correctly Case 2: New installation by Helm Install Longhorn master, set tolerations like: defaultSettings: taintToleration: &#34;key=value:NoSchedule&#34; longhornManager: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule longhornDriver: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule longhornUI: priorityClass: ~ tolerations: - key: key operator: Equal value: value effect: NoSchedule Verify that the toleration is added for: IM pods, Share Manager pods, CSI deployments, CSI daemonset, the backup jobs, manager, drive deployer, UI Uninstall the Helm release. Test Disable IPv6 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/disable_ipv6/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/disable_ipv6/ - Related issue https://github.com/longhorn/longhorn/issues/2136 -https://github.com/longhorn/longhorn/issues/2197 -Longhorn v1.1.1 should work with IPv6 disabled. -Scenario Install Kubernetes Disable IPv6 on all the worker nodes using the following Go to the folder /etc/default In the grub file, edit the value GRUB_CMDLINE_LINUX_DEFAULT=&#34;ipv6.disable=1&#34; Once the file is saved update by the command update-grub Reboot the node and once the node becomes active, Use the command cat /proc/cmdline to verify &#34;ipv6.disable=1&#34; is reflected in the values Deploy Longhorn and test basic use cases. + Related issue https://github.com/longhorn/longhorn/issues/2136 https://github.com/longhorn/longhorn/issues/2197 Longhorn v1.1.1 should work with IPv6 disabled. Scenario Install Kubernetes Disable IPv6 on all the worker nodes using the following Go to the folder /etc/default In the grub file, edit the value GRUB_CMDLINE_LINUX_DEFAULT=&#34;ipv6.disable=1&#34; Once the file is saved update by the command update-grub Reboot the node and once the node becomes active, Use the command cat /proc/cmdline to verify &#34;ipv6.disable=1&#34; is reflected in the values Deploy Longhorn and test basic use cases. Test File Sync Cancellation https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/test-file-sync-cancellation/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/test-file-sync-cancellation/ - Related issue https://github.com/longhorn/longhorn/issues/2416 -Test step For test convenience, manually launch the backing image manager pods: apiVersion: apps/v1 kind: DaemonSet metadata: labels: app: backing-image-manager name: backing-image-manager namespace: longhorn-system spec: selector: matchLabels: app: backing-image-manager template: metadata: labels: app: backing-image-manager spec: containers: - name: backing-image-manager image: longhornio/backing-image-manager:master imagePullPolicy: Always securityContext: privileged: true command: - backing-image-manager - --debug - daemon - --listen - 0.0.0.0:8000 readinessProbe: tcpSocket: port: 8000 volumeMounts: - name: disk-path mountPath: /data volumes: - name: disk-path hostPath: path: /var/lib/longhorn/ serviceAccountName: longhorn-service-account Download a backing image in the first pod: # alias bm=&#34;backing-image-manager backing-image&#34; # bm pull --name bi-test --uuid uuid-bi-test --download-url https://cloud-images. + Related issue https://github.com/longhorn/longhorn/issues/2416 Test step For test convenience, manually launch the backing image manager pods: apiVersion: apps/v1 kind: DaemonSet metadata: labels: app: backing-image-manager name: backing-image-manager namespace: longhorn-system spec: selector: matchLabels: app: backing-image-manager template: metadata: labels: app: backing-image-manager spec: containers: - name: backing-image-manager image: longhornio/backing-image-manager:master imagePullPolicy: Always securityContext: privileged: true command: - backing-image-manager - --debug - daemon - --listen - 0.0.0.0:8000 readinessProbe: tcpSocket: port: 8000 volumeMounts: - name: disk-path mountPath: /data volumes: - name: disk-path hostPath: path: /var/lib/longhorn/ serviceAccountName: longhorn-service-account Download a backing image in the first pod: # alias bm=&#34;backing-image-manager backing-image&#34; # bm pull --name bi-test --uuid uuid-bi-test --download-url https://cloud-images. Test Frontend Traffic https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/ws-traffic-flood/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/ws-traffic-flood/ - Related issue https://github.com/longhorn/longhorn/issues/2372 -Test Frontend Traffic Given 100 pvc created. -And all pvcs deployed and detached. -When monitor traffic in frontend pod with nload. -apk add nload nload Then should not see a continuing large amount of traffic when there is no operation happening. The smaller spikes are mostly coming from event resources which possibly could be enhanced later (https://github.com/longhorn/longhorn/issues/2433). + Related issue https://github.com/longhorn/longhorn/issues/2372 Test Frontend Traffic Given 100 pvc created. And all pvcs deployed and detached. When monitor traffic in frontend pod with nload. apk add nload nload Then should not see a continuing large amount of traffic when there is no operation happening. The smaller spikes are mostly coming from event resources which possibly could be enhanced later (https://github.com/longhorn/longhorn/issues/2433). Test Node Delete https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/delete-node/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/delete-node/ - Related issue https://github.com/longhorn/longhorn/issues/2186 https://github.com/longhorn/longhorn/issues/2462 -Delete Method Should verify with both of the delete methods. -Bulk Delete - This is the Delete on the Node page. Node Delete - This is the Remove Node for each node Operation drop-down list. Test Node Delete - should grey out when node not down Given node not Down. -When Try to delete any node. -Then Should see button greyed out. -Test Node Delete Given pod with pvc created. + Related issue https://github.com/longhorn/longhorn/issues/2186 https://github.com/longhorn/longhorn/issues/2462 Delete Method Should verify with both of the delete methods. Bulk Delete - This is the Delete on the Node page. Node Delete - This is the Remove Node for each node Operation drop-down list. Test Node Delete - should grey out when node not down Given node not Down. When Try to delete any node. Then Should see button greyed out. Test Node Delete Given pod with pvc created. Test Node Selector https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/test-node-selector/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/test-node-selector/ - Prepare the cluster Using Rancher RKE to create a cluster of 2 Windows worker nodes and 3 Linux worker nodes. Rancher will add the taint cattle.io/os=linux:NoSchedule to Linux nodes Kubernetes will add label kubernetes.io/os:linux to Linux nodes Test steps Repeat the following steps for each type of Longhorn installation: Rancher, Helm, Kubectl: -Follow the Longhorn document at the PR https://github.com/longhorn/website/pull/287 to install Longhorn with toleration cattle.io/os=linux:NoSchedule and node selector kubernetes.io/os:linux Verify that Longhorn get deployed successfully on the 3 Linux nodes Verify all volume basic functionalities is working ok Create a volume of 3 replica named vol-1 Add label longhorn. + Prepare the cluster Using Rancher RKE to create a cluster of 2 Windows worker nodes and 3 Linux worker nodes. Rancher will add the taint cattle.io/os=linux:NoSchedule to Linux nodes Kubernetes will add label kubernetes.io/os:linux to Linux nodes Test steps Repeat the following steps for each type of Longhorn installation: Rancher, Helm, Kubectl: Follow the Longhorn document at the PR https://github.com/longhorn/website/pull/287 to install Longhorn with toleration cattle.io/os=linux:NoSchedule and node selector kubernetes.io/os:linux Verify that Longhorn get deployed successfully on the 3 Linux nodes Verify all volume basic functionalities is working ok Create a volume of 3 replica named vol-1 Add label longhorn. Test RWX share-mount ownership reset https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/rwx-mount-ownership-reset/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/rwx-mount-ownership-reset/ - Related issue https://github.com/longhorn/longhorn/issues/2357 -Test RWX share-mount ownership Given Setup one of cluster node to use host FQDN. -root@ip-172-30-0-139:/home/ubuntu# cat /etc/hosts 127.0.0.1 localhost 54.255.224.72 ip-172-30-0-139.lan ip-172-30-0-139 root@ip-172-30-0-139:/home/ubuntu# hostname ip-172-30-0-139 root@ip-172-30-0-139:/home/ubuntu# hostname -f ip-172-30-0-139.lan And Domain = localdomain is commented out in /etc/idmapd.conf on cluster hosts. This is to ensure localdomain is not enforce to sync between server and client. Ref: https://github.com/longhorn/website/pull/279 -root@ip-172-30-0-139:~# cat /etc/idmapd.conf [General] Verbosity = 0 Pipefs-Directory = /run/rpc_pipefs # set your own domain here, if it differs from FQDN minus hostname # Domain = localdomain [Mapping] Nobody-User = nobody Nobody-Group = nogroup And pod with rwx pvc deployed to the node with host FQDN. + Related issue https://github.com/longhorn/longhorn/issues/2357 Test RWX share-mount ownership Given Setup one of cluster node to use host FQDN. root@ip-172-30-0-139:/home/ubuntu# cat /etc/hosts 127.0.0.1 localhost 54.255.224.72 ip-172-30-0-139.lan ip-172-30-0-139 root@ip-172-30-0-139:/home/ubuntu# hostname ip-172-30-0-139 root@ip-172-30-0-139:/home/ubuntu# hostname -f ip-172-30-0-139.lan And Domain = localdomain is commented out in /etc/idmapd.conf on cluster hosts. This is to ensure localdomain is not enforce to sync between server and client. Ref: https://github.com/longhorn/website/pull/279 root@ip-172-30-0-139:~# cat /etc/idmapd.conf [General] Verbosity = 0 Pipefs-Directory = /run/rpc_pipefs # set your own domain here, if it differs from FQDN minus hostname # Domain = localdomain [Mapping] Nobody-User = nobody Nobody-Group = nogroup And pod with rwx pvc deployed to the node with host FQDN. Test Service Account mount on host @@ -106,17 +82,14 @@ root@ip-172-30-0-139:~# cat /etc/idmapd.conf [General] Verbosity = 0 Pipefs-Dire https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/snapshot-purge-error-handling/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/snapshot-purge-error-handling/ - Related issue https://github.com/longhorn/longhorn/issues/1895 -Longhorn v1.1.1 handles the error during snapshot purge better and reports to Longhorn-manager. -Scenario-1 Create a volume with 3 replicas and attach to a pod. Write some data into the volume and take a snapshot. Delete a replica that will result in creating a system generated snapshot. Wait for replica to finish and take another snapshot. ssh into a node and resize the latest snapshot. (e.g dd if=/dev/urandom count=50 bs=1M of=&lt;SNAPSHOT-NAME&gt;. + Related issue https://github.com/longhorn/longhorn/issues/1895 Longhorn v1.1.1 handles the error during snapshot purge better and reports to Longhorn-manager. Scenario-1 Create a volume with 3 replicas and attach to a pod. Write some data into the volume and take a snapshot. Delete a replica that will result in creating a system generated snapshot. Wait for replica to finish and take another snapshot. ssh into a node and resize the latest snapshot. (e.g dd if=/dev/urandom count=50 bs=1M of=&lt;SNAPSHOT-NAME&gt;. Test system upgrade with the deprecated CPU setting https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/system-upgrade-with-deprecated-cpu-setting/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.1/system-upgrade-with-deprecated-cpu-setting/ - Related issue https://github.com/longhorn/longhorn/issues/2207 -Test step Deploy a cluster that each node has different CPUs. Launch Longhorn v1.1.0. Deploy some workloads using Longhorn volumes. Upgrade to the latest Longhorn version. Validate: all workloads work fine and no instance manager pod crash during the upgrade. The fields node.Spec.EngineManagerCPURequest and node.Spec.ReplicaManagerCPURequest of each node are the same as the setting Guaranteed Engine CPU value in the old version * 1000. The old setting Guaranteed Engine CPU is deprecated with an empty value. + Related issue https://github.com/longhorn/longhorn/issues/2207 Test step Deploy a cluster that each node has different CPUs. Launch Longhorn v1.1.0. Deploy some workloads using Longhorn volumes. Upgrade to the latest Longhorn version. Validate: all workloads work fine and no instance manager pod crash during the upgrade. The fields node.Spec.EngineManagerCPURequest and node.Spec.ReplicaManagerCPURequest of each node are the same as the setting Guaranteed Engine CPU value in the old version * 1000. The old setting Guaranteed Engine CPU is deprecated with an empty value. diff --git a/manual/release-specific/v1.1.2/index.xml b/manual/release-specific/v1.1.2/index.xml index e6ac39008d..f683d2406d 100644 --- a/manual/release-specific/v1.1.2/index.xml +++ b/manual/release-specific/v1.1.2/index.xml @@ -12,30 +12,21 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/delete-cronjob-for-detached-volumes/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/delete-cronjob-for-detached-volumes/ - Related issue https://github.com/longhorn/longhorn/issues/2513 -Steps Make sure the setting Allow Recurring Job While Volume Is Detached is disabled Create a volume. Attach to a node. Create a recurring backup job that run every minute. Wait for the cronjob to be scheduled a few times. Detach the volume. Verify that the CronJob get deleted. Wait 2 hours (&gt; 100 mins). Attach the volume to a node. Verify that the CronJob get created. Verify that Kubernetes schedules a run for the CronJob at the beginning of the next minute. + Related issue https://github.com/longhorn/longhorn/issues/2513 Steps Make sure the setting Allow Recurring Job While Volume Is Detached is disabled Create a volume. Attach to a node. Create a recurring backup job that run every minute. Wait for the cronjob to be scheduled a few times. Detach the volume. Verify that the CronJob get deleted. Wait 2 hours (&gt; 100 mins). Attach the volume to a node. Verify that the CronJob get created. Verify that Kubernetes schedules a run for the CronJob at the beginning of the next minute. Test Frontend Web-socket Data Transfer When Resource Not Updated https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/full-ws-data-tranfer-when-no-updates/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/full-ws-data-tranfer-when-no-updates/ - Related issue https://github.com/longhorn/longhorn-manager/pull/918 https://github.com/longhorn/longhorn/issues/2646 https://github.com/longhorn/longhorn/issues/2591 -Test Data Send Over Web-socket When No Resource Updated Given 1 PVC/Pod created. And the Pod is not writing to the mounted volume. -When monitor network traffic with browser inspect tool. -Then wait for 3 mins And should not see data send over web-socket when there are no updates to the resources. -Test Data Send Over Web-socket Resource Updated Given monitor network traffic with browser inspect tool. + Related issue https://github.com/longhorn/longhorn-manager/pull/918 https://github.com/longhorn/longhorn/issues/2646 https://github.com/longhorn/longhorn/issues/2591 Test Data Send Over Web-socket When No Resource Updated Given 1 PVC/Pod created. And the Pod is not writing to the mounted volume. When monitor network traffic with browser inspect tool. Then wait for 3 mins And should not see data send over web-socket when there are no updates to the resources. Test Data Send Over Web-socket Resource Updated Given monitor network traffic with browser inspect tool. Test Instance Manager Streaming Connection Recovery https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/instance-manager-streaming-connection-recovery/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.1.2/instance-manager-streaming-connection-recovery/ - Related issue https://github.com/longhorn/longhorn/issues/2561 -Test Step Given A cluster with Longhorn -And create a volume and attach it to a pod. -And exec into a longhorn manager pod and kill the connection with an engine or replica instance manager pod. The connections are instance manager pods&rsquo; IP with port 8500. -$ kl exec -it longhorn-manager-5z8zn -- bash root@longhorn-manager-5z8zn:/# ss Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 0 0 10. + Related issue https://github.com/longhorn/longhorn/issues/2561 Test Step Given A cluster with Longhorn And create a volume and attach it to a pod. And exec into a longhorn manager pod and kill the connection with an engine or replica instance manager pod. The connections are instance manager pods&rsquo; IP with port 8500. $ kl exec -it longhorn-manager-5z8zn -- bash root@longhorn-manager-5z8zn:/# ss Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port tcp ESTAB 0 0 10. diff --git a/manual/release-specific/v1.2.0/index.xml b/manual/release-specific/v1.2.0/index.xml index b8fb20896d..b682822e80 100644 --- a/manual/release-specific/v1.2.0/index.xml +++ b/manual/release-specific/v1.2.0/index.xml @@ -19,10 +19,7 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/backup-creation-with-old-engine-image/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/backup-creation-with-old-engine-image/ - Related issue https://github.com/longhorn/longhorn/issues/2897 -Test Step Given with Longhorn v1.2.0-rc2 or above. And deploy engine image oldEI older than v1.2.0 (for example: longhornio/longhorn-engine:v1.1.2). And create volume vol-old-engine. And attach volume vol-old-engine to one of a node. And upgrade volume vol-old-engine to engine image oldEI. -When create backup of volume vol-old-engine. -Then watch kubectl kubectl get backups.longhorn.io -l backup-volume=vol-old-engine -w. And should see two backups temporarily (in transition state). And should see only one backup be left after a while. + Related issue https://github.com/longhorn/longhorn/issues/2897 Test Step Given with Longhorn v1.2.0-rc2 or above. And deploy engine image oldEI older than v1.2.0 (for example: longhornio/longhorn-engine:v1.1.2). And create volume vol-old-engine. And attach volume vol-old-engine to one of a node. And upgrade volume vol-old-engine to engine image oldEI. When create backup of volume vol-old-engine. Then watch kubectl kubectl get backups.longhorn.io -l backup-volume=vol-old-engine -w. And should see two backups temporarily (in transition state). And should see only one backup be left after a while. Test instance manager cleanup during uninstall @@ -36,25 +33,14 @@ Then watch kubectl kubectl get backups.longhorn.io -l backup-volume=vol-old-engi https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/label-driven-recurring-job/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/label-driven-recurring-job/ - Related issue https://github.com/longhorn/longhorn/issues/467 -Test Recurring Job Concurrency Given create snapshot recurring job with concurrency set to 2 and include snapshot recurring job default in groups. -When create volume test-job-1. -And create volume test-job-2. -And create volume test-job-3. -And create volume test-job-4. -And create volume test-job-5. -Then moniter the cron job pod log. -And should see 2 jobs created concurrently. -When update snapshot1 recurring job with concurrency set to 3. -Then moniter the cron job pod log. + Related issue https://github.com/longhorn/longhorn/issues/467 Test Recurring Job Concurrency Given create snapshot recurring job with concurrency set to 2 and include snapshot recurring job default in groups. When create volume test-job-1. And create volume test-job-2. And create volume test-job-3. And create volume test-job-4. And create volume test-job-5. Then moniter the cron job pod log. And should see 2 jobs created concurrently. When update snapshot1 recurring job with concurrency set to 3. Then moniter the cron job pod log. Test Version Bump of Kubernetes, API version group, CSI component's dependency version https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/test_version_bump/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.0/test_version_bump/ - GitHub issue: https://github.com/longhorn/longhorn/issues/2757 -Test with specific Kubernetes version For each Kubernetes version (1.18, 1.19, 1.20, 1.21, 1.22), test basic functionalities of Longhorn v1.2.0 (create/attach/detach/delete volume/backup/snapshot using yaml/UI) Test Kubernetes and Longhorn upgrade Deploy K3s v1.21 Deploy Longhorn v1.1.2 Create some workload pods using Longhorn volumes Upgrade Longhorn to v1.2.0 Verify that everything is OK Upgrade K3s to v1.22 Verify that everything is OK Retest the Upgrade Lease Lock We remove the client-go patch https://github. + GitHub issue: https://github.com/longhorn/longhorn/issues/2757 Test with specific Kubernetes version For each Kubernetes version (1.18, 1.19, 1.20, 1.21, 1.22), test basic functionalities of Longhorn v1.2.0 (create/attach/detach/delete volume/backup/snapshot using yaml/UI) Test Kubernetes and Longhorn upgrade Deploy K3s v1.21 Deploy Longhorn v1.1.2 Create some workload pods using Longhorn volumes Upgrade Longhorn to v1.2.0 Verify that everything is OK Upgrade K3s to v1.22 Verify that everything is OK Retest the Upgrade Lease Lock We remove the client-go patch https://github. diff --git a/manual/release-specific/v1.2.3/index.xml b/manual/release-specific/v1.2.3/index.xml index 5840ef0ff8..ff38d8c9f9 100644 --- a/manual/release-specific/v1.2.3/index.xml +++ b/manual/release-specific/v1.2.3/index.xml @@ -19,18 +19,14 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.3/test-backing-image-space-usage/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.3/test-backing-image-space-usage/ - Prerequisite A sparse file should be prepared before test. e.g.: -~ touch empty-filesystem.raw ~ truncate -s 500M empty-filesystem.raw ~ mkfs.ext4 empty-filesystem.raw mke2fs 1.46.1 (9-Feb-2021) Creating filesystem with 512000 1k blocks and 128016 inodes Filesystem UUID: fe6cfb58-134a-42b3-afab-59474d9515e0 Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409 Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done ~ shasum -a 512 empty-filesystem. + Prerequisite A sparse file should be prepared before test. e.g.: ~ touch empty-filesystem.raw ~ truncate -s 500M empty-filesystem.raw ~ mkfs.ext4 empty-filesystem.raw mke2fs 1.46.1 (9-Feb-2021) Creating filesystem with 512000 1k blocks and 128016 inodes Filesystem UUID: fe6cfb58-134a-42b3-afab-59474d9515e0 Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409 Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information: done ~ shasum -a 512 empty-filesystem. Test scalability with backing image https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.3/test-scalability-with-backing-image/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.2.3/test-scalability-with-backing-image/ - Test step Deploy a cluster with 3 worker nodes. The recommended nodes is 4v cores CPU + 8G memory at least. -Deploy Longhorn. -Launch 10 backing images with the following YAML: -apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test1 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2 --- apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test2 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2 --- apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test3 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image. + Test step Deploy a cluster with 3 worker nodes. The recommended nodes is 4v cores CPU + 8G memory at least. Deploy Longhorn. Launch 10 backing images with the following YAML: apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test1 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2 --- apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test2 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image.s3-us-west-1.amazonaws.com/parrot.qcow2 --- apiVersion: longhorn.io/v1beta1 kind: BackingImage metadata: name: bi-test3 namespace: longhorn-system spec: sourceType: download sourceParameters: url: https://longhorn-backing-image. diff --git a/manual/release-specific/v1.3.0/index.xml b/manual/release-specific/v1.3.0/index.xml index 36bb4ddcd8..9aca53df8f 100644 --- a/manual/release-specific/v1.3.0/index.xml +++ b/manual/release-specific/v1.3.0/index.xml @@ -12,21 +12,14 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/extend_csi_snapshot_support/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/extend_csi_snapshot_support/ - Related issue https://github.com/longhorn/longhorn/issues/2534 -Test Setup Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/ Deploy 4 VolumeSnapshotClass: kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-backup-1 driver: driver.longhorn.io deletionPolicy: Delete kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-backup-2 driver: driver.longhorn.io deletionPolicy: Delete parameters: type: bak kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-snapshot driver: driver.longhorn.io deletionPolicy: Delete parameters: type: snap kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: invalid-class driver: driver.longhorn.io deletionPolicy: Delete parameters: type: invalid Create Longhorn volume test-vol of 5GB. + Related issue https://github.com/longhorn/longhorn/issues/2534 Test Setup Deploy the CSI snapshot CRDs, Controller as instructed at https://longhorn.io/docs/1.2.3/snapshots-and-backups/csi-snapshot-support/enable-csi-snapshot-support/ Deploy 4 VolumeSnapshotClass: kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-backup-1 driver: driver.longhorn.io deletionPolicy: Delete kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-backup-2 driver: driver.longhorn.io deletionPolicy: Delete parameters: type: bak kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: longhorn-snapshot driver: driver.longhorn.io deletionPolicy: Delete parameters: type: snap kind: VolumeSnapshotClass apiVersion: snapshot.storage.k8s.io/v1beta1 metadata: name: invalid-class driver: driver.longhorn.io deletionPolicy: Delete parameters: type: invalid Create Longhorn volume test-vol of 5GB. Setup and test storage network https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-storage-network/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-storage-network/ - Related issue https://github.com/longhorn/longhorn/issues/2285 -Test storage network Create AWS instances Given Create VPC. -VPC only IPv4 CIDR 10.0.0.0/16 And Create an internet gateway. -Attach to VPC And Add the internet gateway to the VPC Main route table, Routes. -Destination 0.0.0.0/0 And Create 2 subnets in the VPC. -Subnet-1: 10.0.1.0/24 Subnet-2: 10.0.2.0/24 And Launch 3 EC2 instances. -Use the created VPC Use subnet-1 for network interface 1 Use subnet-2 for network interface 2 Disable Auto-assign public IP Add security group inbound rule to allow All traffic from Anywhere-IPv4 Stop Source/destination check And Create 3 elastic IPs. + Related issue https://github.com/longhorn/longhorn/issues/2285 Test storage network Create AWS instances Given Create VPC. VPC only IPv4 CIDR 10.0.0.0/16 And Create an internet gateway. Attach to VPC And Add the internet gateway to the VPC Main route table, Routes. Destination 0.0.0.0/0 And Create 2 subnets in the VPC. Subnet-1: 10.0.1.0/24 Subnet-2: 10.0.2.0/24 And Launch 3 EC2 instances. Use the created VPC Use subnet-1 for network interface 1 Use subnet-2 for network interface 2 Disable Auto-assign public IP Add security group inbound rule to allow All traffic from Anywhere-IPv4 Stop Source/destination check And Create 3 elastic IPs. Test backing image download to local @@ -40,24 +33,14 @@ Use the created VPC Use subnet-1 for network interface 1 Use subnet-2 for networ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-helm-uninstall-different-namespace/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-helm-uninstall-different-namespace/ - Related issue https://github.com/longhorn/longhorn/issues/2034 -Test Given helm install Longhorn in different namespace -When helm uninstall Longhorn -Then Longhorn should complete uninstalling. + Related issue https://github.com/longhorn/longhorn/issues/2034 Test Given helm install Longhorn in different namespace When helm uninstall Longhorn Then Longhorn should complete uninstalling. Test IM Proxy connection metrics https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-grpc-proxy/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-grpc-proxy/ - Related issue https://github.com/longhorn/longhorn/issues/2821 https://github.com/longhorn/longhorn/issues/4038 -Test gRPC proxy Given Longhorn exist in the cluster. -And Monitoring stack exist in the cluster. -When Execute longhorn_instance_manager_proxy_grpc_connection in Prometheus UI. -Then Metric data shows in Prometheus UI. -When Monitor longhorn_instance_manager_proxy_grpc_connection in Grafana UI Panel. -And Run automation regression. -Then Connections should return to 0 when tests complete. + Related issue https://github.com/longhorn/longhorn/issues/2821 https://github.com/longhorn/longhorn/issues/4038 Test gRPC proxy Given Longhorn exist in the cluster. And Monitoring stack exist in the cluster. When Execute longhorn_instance_manager_proxy_grpc_connection in Prometheus UI. Then Metric data shows in Prometheus UI. When Monitor longhorn_instance_manager_proxy_grpc_connection in Grafana UI Panel. And Run automation regression. Then Connections should return to 0 when tests complete. Test instance manager NPE @@ -85,10 +68,7 @@ Then Connections should return to 0 when tests complete. https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-npe-when-longhorn-ui-deployment-not-exist/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.3.0/test-npe-when-longhorn-ui-deployment-not-exist/ - Related issue https://github.com/longhorn/longhorn/issues/4065 -Test Given helm install Longhorn -When delete deployment/longhorn-ui And update setting/kubernetes-cluster-autoscaler-enabled to true or false -Then longhorn-manager pods should still be Running. + Related issue https://github.com/longhorn/longhorn/issues/4065 Test Given helm install Longhorn When delete deployment/longhorn-ui And update setting/kubernetes-cluster-autoscaler-enabled to true or false Then longhorn-manager pods should still be Running. Test snapshot purge retry diff --git a/manual/release-specific/v1.4.0/index.xml b/manual/release-specific/v1.4.0/index.xml index 1568dd76df..4502201496 100644 --- a/manual/release-specific/v1.4.0/index.xml +++ b/manual/release-specific/v1.4.0/index.xml @@ -12,115 +12,63 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-csi-plugin-liveness-probe/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-csi-plugin-liveness-probe/ - Related discussion https://github.com/longhorn/longhorn/issues/3907 -Test CSI plugin liveness probe should recover CSI socket file Given healthy Longhorn cluster -When delete the Longhorn CSI socket file on one of the node(node-1). rm /var/lib/kubelet/plugins/driver.longhorn.io/csi.sock -Then the longhorn-csi-plugin-* pod on node-1 should be restarted. -And the csi-provisioner-* pod on node-1 should be restarted. -And the csi-resizer-* pod on node-1 should be restarted. -And the csi-snapshotter-* pod on node-1 should be restarted. -And the csi-attacher-* pod on node-1 should be restarted. + Related discussion https://github.com/longhorn/longhorn/issues/3907 Test CSI plugin liveness probe should recover CSI socket file Given healthy Longhorn cluster When delete the Longhorn CSI socket file on one of the node(node-1). rm /var/lib/kubelet/plugins/driver.longhorn.io/csi.sock Then the longhorn-csi-plugin-* pod on node-1 should be restarted. And the csi-provisioner-* pod on node-1 should be restarted. And the csi-resizer-* pod on node-1 should be restarted. And the csi-snapshotter-* pod on node-1 should be restarted. And the csi-attacher-* pod on node-1 should be restarted. Test engine binary recovery https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-engine-binary-recovery/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-engine-binary-recovery/ - Related issue https://github.com/longhorn/longhorn/issues/4380 -Steps Test remove engine binary on host should recover Given EngineImage custom resource deployed -&gt; kubectl -n longhorn-system get engineimage NAME STATE IMAGE REFCOUNT BUILDDATE AGE ei-b907910b deployed longhornio/longhorn-engine:master-head 0 3d23h 2m25s And engine image pods Ready are 1/1. -&gt; kubectl -n longhorn-system get pod | grep engine-image engine-image-ei-b907910b-g4kpd 1/1 Running 0 2m43s engine-image-ei-b907910b-46k6t 1/1 Running 0 2m43s engine-image-ei-b907910b-t6wnd 1/1 Running 0 2m43s When Delete engine binary on host + Related issue https://github.com/longhorn/longhorn/issues/4380 Steps Test remove engine binary on host should recover Given EngineImage custom resource deployed &gt; kubectl -n longhorn-system get engineimage NAME STATE IMAGE REFCOUNT BUILDDATE AGE ei-b907910b deployed longhornio/longhorn-engine:master-head 0 3d23h 2m25s And engine image pods Ready are 1/1. &gt; kubectl -n longhorn-system get pod | grep engine-image engine-image-ei-b907910b-g4kpd 1/1 Running 0 2m43s engine-image-ei-b907910b-46k6t 1/1 Running 0 2m43s engine-image-ei-b907910b-t6wnd 1/1 Running 0 2m43s When Delete engine binary on host Test filesystem trim https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-filesystem-trim/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-filesystem-trim/ - Related issue https://github.com/longhorn/longhorn/issues/836 -Case 1: Test filesystem trim during writing Given A 10G volume created. -And Volume attached to node-1. -And Make a filesystem like EXT4 or XFS for the volume. -And Mount the filesystem on a mount point. -Then Run the below shell script with the correct mount point specified: -#!/usr/bin/env bash MOUNT_POINT=${1} dd if=/dev/urandom of=/mnt/data bs=1M count=8000 sync CKSUM=`md5sum /mnt/data | awk &#39;{print $1}&#39;` for INDEX in {1..10..1}; do rm -rf ${MOUNT_POINT}/data dd if=/mnt/data of=${MOUNT_POINT}/data &amp; RAND_SLEEP_INTERVAL=$(($(($RANDOM%50))+10)) sleep ${RAND_SLEEP_INTERVAL} fstrim ${MOUNT_POINT} while [ `ps aux | grep &#34;dd if&#34; | grep -v grep | wc -l` -eq &#34;1&#34; ] do sleep 1 done CUR_CKSUM=`md5sum ${MOUNT_POINT}/data | awk &#39;{print $1}&#39;` if [ $CUR_CKSUM ! + Related issue https://github.com/longhorn/longhorn/issues/836 Case 1: Test filesystem trim during writing Given A 10G volume created. And Volume attached to node-1. And Make a filesystem like EXT4 or XFS for the volume. And Mount the filesystem on a mount point. Then Run the below shell script with the correct mount point specified: #!/usr/bin/env bash MOUNT_POINT=${1} dd if=/dev/urandom of=/mnt/data bs=1M count=8000 sync CKSUM=`md5sum /mnt/data | awk &#39;{print $1}&#39;` for INDEX in {1..10..1}; do rm -rf ${MOUNT_POINT}/data dd if=/mnt/data of=${MOUNT_POINT}/data &amp; RAND_SLEEP_INTERVAL=$(($(($RANDOM%50))+10)) sleep ${RAND_SLEEP_INTERVAL} fstrim ${MOUNT_POINT} while [ `ps aux | grep &#34;dd if&#34; | grep -v grep | wc -l` -eq &#34;1&#34; ] do sleep 1 done CUR_CKSUM=`md5sum ${MOUNT_POINT}/data | awk &#39;{print $1}&#39;` if [ $CUR_CKSUM ! Test helm on Rancher deployed Windows Cluster https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-helm-install-on-rancher-deployed-windows-cluster/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-helm-install-on-rancher-deployed-windows-cluster/ - Related issue https://github.com/longhorn/longhorn/issues/4246 -Test Install Given Rancher cluster. -And 3 new instances for the Windows cluster following Architecture Requirements. -And docker installed on the 3 Windows cluster instances. -And Disabled Private IP Address Checks for the 3 Windows cluster instances. -And Created new Custom Windows cluster with Rancher. -Select Flannel for Network Provider Enable Windows Support -And Added the 3 nodes to the Rancher Windows cluster. -Add Linux Master Node + Related issue https://github.com/longhorn/longhorn/issues/4246 Test Install Given Rancher cluster. And 3 new instances for the Windows cluster following Architecture Requirements. And docker installed on the 3 Windows cluster instances. And Disabled Private IP Address Checks for the 3 Windows cluster instances. And Created new Custom Windows cluster with Rancher. Select Flannel for Network Provider Enable Windows Support And Added the 3 nodes to the Rancher Windows cluster. Add Linux Master Node Test Longhorn system backup should sync from the remote backup target https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-system-backup/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-system-backup/ - Steps Given Custom resource SystemBackup (foo) exist in AWS S3, -And System backup (foo) downloaded from AWS S3. -And Custom resource SystemBackup (foo) deleted. -When Upload the system backup (foo) to AWS S3. -And Create a new custom resource SystemBackup(foo). -This needs to be done before the system backup gets synced to the cluster. -Then Should see the synced messages in the custom resource SystemBackup(foo). -Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Syncing 9m29s longhorn-system-backup-controller Syncing system backup from backup target Normal Synced 9m28s longhorn-system-backup-controller Synced system backup from backup target + Steps Given Custom resource SystemBackup (foo) exist in AWS S3, And System backup (foo) downloaded from AWS S3. And Custom resource SystemBackup (foo) deleted. When Upload the system backup (foo) to AWS S3. And Create a new custom resource SystemBackup(foo). This needs to be done before the system backup gets synced to the cluster. Then Should see the synced messages in the custom resource SystemBackup(foo). Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Syncing 9m29s longhorn-system-backup-controller Syncing system backup from backup target Normal Synced 9m28s longhorn-system-backup-controller Synced system backup from backup target Test Node ID Change During Backing Image Creation https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-node-id-change-during-backing-image-creation/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-node-id-change-during-backing-image-creation/ - Related issue https://github.com/longhorn/longhorn/issues/4887 -Steps Given A relatively large file so that uploading it would take several minutes at least. -And Upload the file as a backing image. -And Monitor the longhorn manager pod logs. -When Add new nodes for the cluster or new disks for the existing Longhorn nodes during the upload. -Then Should see the upload success. -And Should not see error messages like below in the longhorn manager pods. + Related issue https://github.com/longhorn/longhorn/issues/4887 Steps Given A relatively large file so that uploading it would take several minutes at least. And Upload the file as a backing image. And Monitor the longhorn manager pod logs. When Add new nodes for the cluster or new disks for the existing Longhorn nodes during the upload. Then Should see the upload success. And Should not see error messages like below in the longhorn manager pods. Test Online Expansion https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-online-expansion/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-online-expansion/ - Related issue https://github.com/longhorn/longhorn/issues/1674 -Test online expansion with continuous reading/writing Given Prepare a relatively large file (5Gi for example) with the checksum calculated. -And Create and attach a volume. -And Monitor the instance manager pod logs. -When Use dd to copy data from the file to the Longhorn block device. -dd if=/mnt/data of=/dev/longhorn/vol bs=1M And Do online expansion for the volume during the copying. -Then The expansion should success. The corresponding block device on the attached node is expanded. + Related issue https://github.com/longhorn/longhorn/issues/1674 Test online expansion with continuous reading/writing Given Prepare a relatively large file (5Gi for example) with the checksum calculated. And Create and attach a volume. And Monitor the instance manager pod logs. When Use dd to copy data from the file to the Longhorn block device. dd if=/mnt/data of=/dev/longhorn/vol bs=1M And Do online expansion for the volume during the copying. Then The expansion should success. The corresponding block device on the attached node is expanded. Test replica scale-down warning https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-replica-scale-down-warning/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-replica-scale-down-warning/ - Related issue https://github.com/longhorn/longhorn/issues/4120 -Steps Given Replica Auto Balance set to least-effort or best-effort. -And Volume with 3 replicas created. -And Volume attached to node-1. -And Monitor node-1 manager pod events. -kubectl alpha events -n longhorn-system pod &lt;node-1 manager pod&gt; -w When Update replica count to 1. -Then Should see Normal replice delete event. -Normal Delete Engine/t1-e-6a846a7a Removed unknown replica tcp://10.42.2.94:10000 from engine And Should not see Warning unknown replica detect event. + Related issue https://github.com/longhorn/longhorn/issues/4120 Steps Given Replica Auto Balance set to least-effort or best-effort. And Volume with 3 replicas created. And Volume attached to node-1. And Monitor node-1 manager pod events. kubectl alpha events -n longhorn-system pod &lt;node-1 manager pod&gt; -w When Update replica count to 1. Then Should see Normal replice delete event. Normal Delete Engine/t1-e-6a846a7a Removed unknown replica tcp://10.42.2.94:10000 from engine And Should not see Warning unknown replica detect event. Test upgrade for migrated Longhorn on Rancher https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-upgrade-for-migrated-longhorn/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.0/test-upgrade-for-migrated-longhorn/ - Related discussion https://github.com/longhorn/longhorn/discussions/4198 -Context: since few customers used our broken chart longhorn 100.2.1+up1.3.1 on Rancher (Now fixed) with the workaround. We would like to verify the future upgrade path for those customers. -Steps Set up a cluster of Kubernetes 1.20. Adding this repo to the apps section in new rancher UI repo: https://github.com/PhanLe1010/charts.git branch: release-v2.6-longhorn-1.3.1. Access old rancher UI by navigating to &lt;your-rancher-url&gt;/g. Install Longhorn 1.0.2. Create/attach some volumes. Create a few recurring snapshot/backup job that run every minutes. + Related discussion https://github.com/longhorn/longhorn/discussions/4198 Context: since few customers used our broken chart longhorn 100.2.1+up1.3.1 on Rancher (Now fixed) with the workaround. We would like to verify the future upgrade path for those customers. Steps Set up a cluster of Kubernetes 1.20. Adding this repo to the apps section in new rancher UI repo: https://github.com/PhanLe1010/charts.git branch: release-v2.6-longhorn-1.3.1. Access old rancher UI by navigating to &lt;your-rancher-url&gt;/g. Install Longhorn 1.0.2. Create/attach some volumes. Create a few recurring snapshot/backup job that run every minutes. diff --git a/manual/release-specific/v1.4.1/index.xml b/manual/release-specific/v1.4.1/index.xml index 8ae161044e..c3d0f3caf9 100644 --- a/manual/release-specific/v1.4.1/index.xml +++ b/manual/release-specific/v1.4.1/index.xml @@ -12,13 +12,7 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.1/test-the-trim-related-option-update-for-old-volumes/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.4.1/test-the-trim-related-option-update-for-old-volumes/ - Related issue https://github.com/longhorn/longhorn/issues/5218 -Test step Given Deploy Longhorn v1.3.2 -And Created and attached a volume. -And Upgrade Longhorn to the latest. -And Do live upgrade for the volume. (The 1st volume using the latest engine image but running in the old instance manager.) -And Created and attached a volume with the v1.3.2 engine image. (The 2nd volume using the old engine image but running in the new instance manager.) -When Try to update volume. + Related issue https://github.com/longhorn/longhorn/issues/5218 Test step Given Deploy Longhorn v1.3.2 And Created and attached a volume. And Upgrade Longhorn to the latest. And Do live upgrade for the volume. (The 1st volume using the latest engine image but running in the old instance manager.) And Created and attached a volume with the v1.3.2 engine image. (The 2nd volume using the old engine image but running in the new instance manager.) When Try to update volume. diff --git a/manual/release-specific/v1.5.0/index.xml b/manual/release-specific/v1.5.0/index.xml index 5dbee9398b..60e2b05ef0 100644 --- a/manual/release-specific/v1.5.0/index.xml +++ b/manual/release-specific/v1.5.0/index.xml @@ -12,25 +12,14 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.5.0/test-upgrade-responder-collect-extra-info/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.5.0/test-upgrade-responder-collect-extra-info/ - Related issue https://github.com/longhorn/longhorn/issues/5235 -Test step Given Patch build and deploy Longhorn. -diff --git a/controller/setting_controller.go b/controller/setting_controller.go index de77b7246..ac6263ac5 100644 --- a/controller/setting_controller.go +++ b/controller/setting_controller.go @@ -49,7 +49,7 @@ const ( var ( upgradeCheckInterval = time.Hour settingControllerResyncPeriod = time.Hour - checkUpgradeURL = &#34;https://longhorn-upgrade-responder.rancher.io/v1/checkupgrade&#34; + checkUpgradeURL = &#34;http://longhorn-upgrade-responder.default.svc.cluster.local:8314/v1/checkupgrade&#34; ) type SettingController struct { Match the checkUpgradeURL with the application name: http://&lt;APP_NAME&gt;-upgrade-responder.default.svc.cluster.local:8314/v1/checkupgrade -And Deploy upgrade responder stack. -When Wait 1~2 hours for collection data to send to the influxDB database. + Related issue https://github.com/longhorn/longhorn/issues/5235 Test step Given Patch build and deploy Longhorn. diff --git a/controller/setting_controller.go b/controller/setting_controller.go index de77b7246..ac6263ac5 100644 --- a/controller/setting_controller.go +++ b/controller/setting_controller.go @@ -49,7 +49,7 @@ const ( var ( upgradeCheckInterval = time.Hour settingControllerResyncPeriod = time.Hour - checkUpgradeURL = &#34;https://longhorn-upgrade-responder.rancher.io/v1/checkupgrade&#34; + checkUpgradeURL = &#34;http://longhorn-upgrade-responder.default.svc.cluster.local:8314/v1/checkupgrade&#34; ) type SettingController struct { Match the checkUpgradeURL with the application name: http://&lt;APP_NAME&gt;-upgrade-responder.default.svc.cluster.local:8314/v1/checkupgrade And Deploy upgrade responder stack. When Wait 1~2 hours for collection data to send to the influxDB database. Test Volume Replica Zone Soft Anti-Affinity Setting https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.5.0/test-the-volume-replica-scheduling/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.5.0/test-the-volume-replica-scheduling/ - Related issue https://github.com/longhorn/longhorn/issues/5358 -Test step - Enable Volume Replica Zone Soft Anti-Affinity Setting Given EKS Cluster with 3 nodes across 2 AWS zones (zone#1, zone#2) -And Deploy Longhorn v1.5.0 -And Disable global replica zone anti-affinity -And Create a volume with 2 replicas, replicaZoneSoftAntiAffinity=enabled and attach it to a node. -When Scale volume replicas to 3 -Then New replica should be scheduled -And No error messages in the longhorn manager pod logs. + Related issue https://github.com/longhorn/longhorn/issues/5358 Test step - Enable Volume Replica Zone Soft Anti-Affinity Setting Given EKS Cluster with 3 nodes across 2 AWS zones (zone#1, zone#2) And Deploy Longhorn v1.5.0 And Disable global replica zone anti-affinity And Create a volume with 2 replicas, replicaZoneSoftAntiAffinity=enabled and attach it to a node. When Scale volume replicas to 3 Then New replica should be scheduled And No error messages in the longhorn manager pod logs. diff --git a/manual/release-specific/v1.6.0/index.xml b/manual/release-specific/v1.6.0/index.xml index df6d6c86b5..4737d15640 100644 --- a/manual/release-specific/v1.6.0/index.xml +++ b/manual/release-specific/v1.6.0/index.xml @@ -12,80 +12,49 @@ https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-storage-network/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-storage-network/ - Related issue https://github.com/longhorn/longhorn/issues/6953 -Test storage network Create AWS instances Given Create VPC. -VPC only IPv4 CIDR 10.0.0.0/16 And Create an internet gateway. -Attach to VPC And Add the internet gateway to the VPC Main route table, Routes. -Destination 0.0.0.0/0 And Create 2 subnets in the VPC. -Subnet-1: 10.0.1.0/24 Subnet-2: 10.0.2.0/24 And Launch 3 EC2 instances. -Use the created VPC Use subnet-1 for network interface 1 Use subnet-2 for network interface 2 Disable Auto-assign public IP Add security group inbound rule to allow All traffic from Anywhere-IPv4 Stop Source/destination check And Create 3 elastic IPs. + Related issue https://github.com/longhorn/longhorn/issues/6953 Test storage network Create AWS instances Given Create VPC. VPC only IPv4 CIDR 10.0.0.0/16 And Create an internet gateway. Attach to VPC And Add the internet gateway to the VPC Main route table, Routes. Destination 0.0.0.0/0 And Create 2 subnets in the VPC. Subnet-1: 10.0.1.0/24 Subnet-2: 10.0.2.0/24 And Launch 3 EC2 instances. Use the created VPC Use subnet-1 for network interface 1 Use subnet-2 for network interface 2 Disable Auto-assign public IP Add security group inbound rule to allow All traffic from Anywhere-IPv4 Stop Source/destination check And Create 3 elastic IPs. Test `Rebuild` in volume.meta blocks engine start https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-rebuild-in-meta-blocks-engine-start/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-rebuild-in-meta-blocks-engine-start/ - Related issue https://github.com/longhorn/longhorn/issues/6626 -Test with patched image Given a patched longhorn-engine image with the following code change. -diff --git a/pkg/sync/sync.go b/pkg/sync/sync.go index b48ddd46..c4523f11 100644 --- a/pkg/sync/sync.go +++ b/pkg/sync/sync.go @@ -534,9 +534,9 @@ func (t *Task) reloadAndVerify(address, instanceName string, repClient *replicaC return err } - if err := repClient.SetRebuilding(false); err != nil { - return err - } + // if err := repClient.SetRebuilding(false); err != nil { + // return err + // } return nil } And a patched longhorn-instance-manager image with the longhorn-engine vendor updated. + Related issue https://github.com/longhorn/longhorn/issues/6626 Test with patched image Given a patched longhorn-engine image with the following code change. diff --git a/pkg/sync/sync.go b/pkg/sync/sync.go index b48ddd46..c4523f11 100644 --- a/pkg/sync/sync.go +++ b/pkg/sync/sync.go @@ -534,9 +534,9 @@ func (t *Task) reloadAndVerify(address, instanceName string, repClient *replicaC return err } - if err := repClient.SetRebuilding(false); err != nil { - return err - } + // if err := repClient.SetRebuilding(false); err != nil { + // return err + // } return nil } And a patched longhorn-instance-manager image with the longhorn-engine vendor updated. Test PVC Name and Namespace included in the volume metrics https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-pvc-name-and-namespace-included-in-volume-metrics/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-pvc-name-and-namespace-included-in-volume-metrics/ - Related issues https://github.com/longhorn/longhorn/issues/5297 https://github.com/longhorn/longhorn-manager/pull/2284 Test step Given created 2 volumes (volume-1, volume-2) -When PVC created for volume (volume-1) And attached volumes (volume-1, volume-2) -Then metrics with longhorn_volume_ prefix should include pvc=&quot;volume-1&quot; -curl -sSL http://10.0.2.212:32744/metrics | grep longhorn_volume | grep ip-10-0-2-151 | grep volume-1 longhorn_volume_actual_size_bytes{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_capacity_bytes{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 1.073741824e+09 longhorn_volume_read_iops{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_read_latency{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_read_throughput{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_robustness{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 1 longhorn_volume_state{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 2 longhorn_volume_write_iops{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_write_latency{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_write_throughput{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 And metrics with longhorn_volume_ prefix should include pvc=&quot;&quot; for (volume-2) + Related issues https://github.com/longhorn/longhorn/issues/5297 https://github.com/longhorn/longhorn-manager/pull/2284 Test step Given created 2 volumes (volume-1, volume-2) When PVC created for volume (volume-1) And attached volumes (volume-1, volume-2) Then metrics with longhorn_volume_ prefix should include pvc=&quot;volume-1&quot; curl -sSL http://10.0.2.212:32744/metrics | grep longhorn_volume | grep ip-10-0-2-151 | grep volume-1 longhorn_volume_actual_size_bytes{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_capacity_bytes{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 1.073741824e+09 longhorn_volume_read_iops{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_read_latency{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_read_throughput{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_robustness{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 1 longhorn_volume_state{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 2 longhorn_volume_write_iops{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_write_latency{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 longhorn_volume_write_throughput{pvc_namespace=&#34;default&#34;,node=&#34;ip-10-0-2-151&#34;,pvc=&#34;volume-1&#34;,volume=&#34;volume-1&#34;} 0 And metrics with longhorn_volume_ prefix should include pvc=&quot;&quot; for (volume-2) Test Replica Disk Soft Anti-Affinity https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-replica-disk-soft-anti-affinity/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-replica-disk-soft-anti-affinity/ - Related issue https://github.com/longhorn/longhorn/issues/3823 -Test initial behavior of global Replica Disk Soft Anti-Affinity setting Given A newly created Longhorn cluster -Then Replica Zone Disk Anti-Affinity shows as false in the UI -And the replica-soft-anti-affinity setting shows false with kubectl -Test initial behavior of global Replica Disk Soft Anti-Affinity setting after upgrade Given A newly upgraded Longhorn cluster -Then Replica Zone Disk Anti-Affinity shows as false in the UI -And the replica-soft-anti-affinity shows false with kubectl + Related issue https://github.com/longhorn/longhorn/issues/3823 Test initial behavior of global Replica Disk Soft Anti-Affinity setting Given A newly created Longhorn cluster Then Replica Zone Disk Anti-Affinity shows as false in the UI And the replica-soft-anti-affinity setting shows false with kubectl Test initial behavior of global Replica Disk Soft Anti-Affinity setting after upgrade Given A newly upgraded Longhorn cluster Then Replica Zone Disk Anti-Affinity shows as false in the UI And the replica-soft-anti-affinity shows false with kubectl Test Support Bundle Metadata File https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-metadata-file/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-metadata-file/ - Related issue https://github.com/longhorn/longhorn/issues/6997 -Test Given Longhorn installed on SUSE Linux -When generated support-bundle with description and issue URL -Then issuedescription has the description in the metadata.yaml -And issueurl has the issue URL in the metadata.yaml + Related issue https://github.com/longhorn/longhorn/issues/6997 Test Given Longhorn installed on SUSE Linux When generated support-bundle with description and issue URL Then issuedescription has the description in the metadata.yaml And issueurl has the issue URL in the metadata.yaml Test Support Bundle Should Include Kubelet Log When On K3s Cluster https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-kubelet-log-for-k3s/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-kubelet-log-for-k3s/ - Related issue https://github.com/longhorn/longhorn/issues/7121 -Test Given Longhorn installed on K3s cluster -When generated support-bundle -Then should have worker node kubelet logs in k3s-agent-service.log -And should have control-plan node kubelet log in k3s-service.log (if Longhorn is deployed on control-plan node) + Related issue https://github.com/longhorn/longhorn/issues/7121 Test Given Longhorn installed on K3s cluster When generated support-bundle Then should have worker node kubelet logs in k3s-agent-service.log And should have control-plan node kubelet log in k3s-service.log (if Longhorn is deployed on control-plan node) Test Support Bundle Syslog Paths https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-syslog-paths/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/release-specific/v1.6.0/test-support-bundle-syslog-paths/ - Related issue https://github.com/longhorn/longhorn/issues/6544 -Test /var/log/messages Given Longhorn installed on SUSE Linux -When generated support-bundle -And syslog exists in the messages file -Test /var/log/syslog Given Longhorn installed on Ubuntu Linux -When generated support-bundle -And syslog exists in the syslog file + Related issue https://github.com/longhorn/longhorn/issues/6544 Test /var/log/messages Given Longhorn installed on SUSE Linux When generated support-bundle And syslog exists in the messages file Test /var/log/syslog Given Longhorn installed on Ubuntu Linux When generated support-bundle And syslog exists in the syslog file diff --git a/manual/test-cases-to-reproduce-attach-detach-issues/index.xml b/manual/test-cases-to-reproduce-attach-detach-issues/index.xml index 406ecffdd2..a4658ea1ff 100644 --- a/manual/test-cases-to-reproduce-attach-detach-issues/index.xml +++ b/manual/test-cases-to-reproduce-attach-detach-issues/index.xml @@ -12,8 +12,7 @@ https://longhorn.github.io/longhorn-tests/manual/test-cases-to-reproduce-attach-detach-issues/attachment-detachment-issues-reproducibility/ Mon, 01 Jan 0001 00:00:00 +0000 https://longhorn.github.io/longhorn-tests/manual/test-cases-to-reproduce-attach-detach-issues/attachment-detachment-issues-reproducibility/ - Prerequisite: Have an environment with just with 2 worker nodes or taint 1 out of 3 worker node to be NoExecute &amp; NoSchedule. This will serve as a constrained fallback and limited source of recovery in the event of failure. -1. Kill the engines and instance manager repeatedly Given 1 RWO and 1 RWX volume is attached to a pod. And Both the volumes have 2 replicas. And Random data is continuously being written to the volume using command dd if=/dev/urandom of=file1 count=100 bs=1M conv=fsync status=progress oflag=direct,sync + Prerequisite: Have an environment with just with 2 worker nodes or taint 1 out of 3 worker node to be NoExecute &amp; NoSchedule. This will serve as a constrained fallback and limited source of recovery in the event of failure. 1. Kill the engines and instance manager repeatedly Given 1 RWO and 1 RWX volume is attached to a pod. And Both the volumes have 2 replicas. And Random data is continuously being written to the volume using command dd if=/dev/urandom of=file1 count=100 bs=1M conv=fsync status=progress oflag=direct,sync