Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEST] Negative test case gets stuck in waiting for a non-exisitng pod stable but the pod has been running #8193

Closed
yangchiu opened this issue Mar 18, 2024 · 10 comments
Assignees
Labels
kind/test Request for adding test

Comments

@yangchiu
Copy link
Member

What's the test to develop? Please describe

Running negative test case Reboot Node One By One While Workload Heavy Writing. It gets stuck in waiting for a pod of a deployment stable:

https://ci.longhorn.io/job/private/job/longhorn-e2e-test/423/

Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4726) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4727) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4728) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4729) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4730) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4731) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4732) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4733) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4734) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4735) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4736) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4737) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4738) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4739) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4740) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4741) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4742) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4743) ...
Waiting for e2e-test-deployment-1 pods ['e2e-test-deployment-1-dbc678584-jxcd7'] stable, retry (4744) ...

But the pod has been running:

# kubectl get pods -w
NAME                                     READY   STATUS    RESTARTS      AGE
e2e-test-deployment-2-7ddccb49f4-qdm9c   1/1     Running   0             86m
e2e-test-deployment-1-dbc678584-jxcd7    1/1     Running   0             87m
longhorn-test-minio-f4bbdc54d-4p9rd      1/1     Running   1 (84m ago)   100m
e2e-test-deployment-0-b957b9f54-97f9v    1/1     Running   0             84m
longhorn-test-nfs-6b985fc5fd-rqwq7       1/1     Running   3 (83m ago)   100m
e2e-test-statefulset-0-0                 1/1     Running   0             84m
e2e-test-statefulset-2-0                 1/1     Running   0             82m
e2e-test-statefulset-1-0                 1/1     Running   0             80m

supportbundle_4ed56920-80b0-49f6-b5be-eb726189c7ff_2024-03-18T02-52-15Z.zip

Describe the tasks for the test

Additional context

@yangchiu yangchiu added the kind/test Request for adding test label Mar 18, 2024
@github-project-automation github-project-automation bot moved this to To do in QA Sprint Mar 18, 2024
@yangchiu
Copy link
Member Author

cc @c3y1huang

@c3y1huang c3y1huang self-assigned this Mar 19, 2024
@yangchiu
Copy link
Member Author

The same test case, another type of failure:

ApiException: (0)
Reason: Handshake status 500 Internal Server Error -+-+- 
{'content-length': '29', 'content-type': 'text/plain; charset=utf-8', 'date': 'Mon, 18 Mar 2024 12:42:25 GMT'} -+-+- 
b'container not found ("sleep")'

https://ci.longhorn.io/job/private/job/longhorn-e2e-test/427/

@yangchiu yangchiu changed the title [TEST][BUG] Negative test case gets stuck in waiting for a non-exisitng pod stable but the pod has been running [TEST] Falky test case Reboot Node One By One While Workload Heavy Writing Mar 19, 2024
@yangchiu yangchiu changed the title [TEST] Falky test case Reboot Node One By One While Workload Heavy Writing [TEST] Flaky test case Reboot Node One By One While Workload Heavy Writing Mar 19, 2024
@c3y1huang
Copy link
Contributor

Running negative test case Reboot Node One By One While Workload Heavy Writing. It gets stuck in waiting for a pod of a deployment stable:
But the pod has been running:

[DEBUG] pod e2e-test-deployment-0-b957b9f54-4rt5p retry 59 != 60
Waiting for e2e-test-deployment-0 pods ['e2e-test-deployment-0-b957b9f54-4rt5p', 'e2e-test-deployment-0-b957b9f54-tnpzq'] stable, retry (59) ...
Waiting for e2e-test-deployment-0 pods ['e2e-test-deployment-0-b957b9f54-tnpzq'] stable, retry (60) ...
[DEBUG] pod e2e-test-deployment-0-b957b9f54-4rt5p retry 61 != 60

Somehow the count skipped 60. To address this, we can check whether the retry count exceeds or equals the maximum retry count.

@c3y1huang
Copy link
Contributor

c3y1huang commented Mar 19, 2024

The same test case, another type of failure:

ApiException: (0)
Reason: Handshake status 500 Internal Server Error -+-+- 
{'content-length': '29', 'content-type': 'text/plain; charset=utf-8', 'date': 'Mon, 18 Mar 2024 12:42:25 GMT'} -+-+- 
b'container not found ("sleep")'

https://ci.longhorn.io/job/private/job/longhorn-e2e-test/427/

I don't know what has triggered this.

  1. Volume attached at 12:31:39.
Mar 18 12:31:39 ip-10-0-2-28 k3s[1352]: I0318 12:31:39.291688    1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"csi-8c8c2b911395deaf705fc0a43968072e1ac253115a720b249e14f9c24e8755ed\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
  1. Deployment verified stable at 12:37:59 (showing 20:37:59.360 in the report).
deployment . And Wait for deployment 0 pods stable
Start / End / Elapsed:	20240318 20:36:53.003 / 20240318 20:37:59.360 / 00:01:06.357
  1. The volume was unmounted.
Mar 18 12:41:49 ip-10-0-2-28 k3s[1352]: E0318 12:41:49.612041    1352 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 podName: nodeName:}" failed. No retries permitted until 2024-03-18 12:43:51.612014825 +0000 UTC m=+986.522755894 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0") pod "e2e-test-deployment-2-7ddccb49f4-l47jl" (UID: "89d256e9-5f00-4ac5-a768-94c41fe9f365") : rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:41:49 ip-10-0-2-28 k3s[1352]: E0318 12:41:49.611852    1352 csi_attacher.go:364] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:41:49 ip-10-0-2-28 k3s[1352]: I0318 12:41:49.593178    1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"csi-8c8c2b911395deaf705fc0a43968072e1ac253115a720b249e14f9c24e8755ed\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
Mar 18 12:41:49 ip-10-0-2-28 k3s[1352]: I0318 12:41:49.589369    1352 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
Mar 18 12:40:54 ip-10-0-2-28 k3s[1352]: E0318 12:40:54.336071    1352 pod_workers.go:1300] "Error syncing pod, skipping" err="unmounted volumes=[pod-data], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/e2e-test-statefulset-2-0" podUID="51755125-cf24-4164-814c-ff779da0505c"
Mar 18 12:40:53 ip-10-0-2-28 k3s[1352]: E0318 12:40:53.336090    1352 pod_workers.go:1300] "Error syncing pod, skipping" err="unmounted volumes=[pod-data], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl" podUID="89d256e9-5f00-4ac5-a768-94c41fe9f365"
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: E0318 12:39:47.768586    1352 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2 podName: nodeName:}" failed. No retries permitted until 2024-03-18 12:41:49.768566096 +0000 UTC m=+864.679307165 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-fb423261-1200-48a9-8077-e177d56746e2" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2") pod "e2e-test-statefulset-2-0" (UID: "51755125-cf24-4164-814c-ff779da0505c") : rpc error: code = InvalidArgument desc = volume pvc-fb423261-1200-48a9-8077-e177d56746e2 hasn't been attached yet
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: E0318 12:39:47.768380    1352 csi_attacher.go:364] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = InvalidArgument desc = volume pvc-fb423261-1200-48a9-8077-e177d56746e2 hasn't been attached yet
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: I0318 12:39:47.745025    1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-fb423261-1200-48a9-8077-e177d56746e2\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2\") pod \"e2e-test-statefulset-2-0\" (UID: \"51755125-cf24-4164-814c-ff779da0505c\") DevicePath \"csi-e3372656429ba5ddd2e47ad9bd23d10b758e716cac78ab1764808b39e824c032\"" pod="default/e2e-test-statefulset-2-0"
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: I0318 12:39:47.741015    1352 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-fb423261-1200-48a9-8077-e177d56746e2\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2\") pod \"e2e-test-statefulset-2-0\" (UID: \"51755125-cf24-4164-814c-ff779da0505c\") DevicePath \"\"" pod="default/e2e-test-statefulset-2-0"
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: E0318 12:39:47.562797    1352 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 podName: nodeName:}" failed. No retries permitted until 2024-03-18 12:41:49.562778041 +0000 UTC m=+864.473519756 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0") pod "e2e-test-deployment-2-7ddccb49f4-l47jl" (UID: "89d256e9-5f00-4ac5-a768-94c41fe9f365") : rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: E0318 12:39:47.562639    1352 csi_attacher.go:364] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: I0318 12:39:47.543889    1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"csi-8c8c2b911395deaf705fc0a43968072e1ac253115a720b249e14f9c24e8755ed\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
Mar 18 12:39:47 ip-10-0-2-28 k3s[1352]: I0318 12:39:47.540153    1352 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
Mar 18 12:38:36 ip-10-0-2-28 k3s[1352]: E0318 12:38:36.336254    1352 pod_workers.go:1300] "Error syncing pod, skipping" err="unmounted volumes=[pod-data], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl" podUID="89d256e9-5f00-4ac5-a768-94c41fe9f365"
Mar 18 12:38:36 ip-10-0-2-28 k3s[1352]: E0318 12:38:36.336255    1352 pod_workers.go:1300] "Error syncing pod, skipping" err="unmounted volumes=[pod-data], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/e2e-test-statefulset-2-0" podUID="51755125-cf24-4164-814c-ff779da0505c"
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: E0318 12:37:45.734339    1352 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2 podName: nodeName:}" failed. No retries permitted until 2024-03-18 12:39:47.734319757 +0000 UTC m=+742.645061587 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-fb423261-1200-48a9-8077-e177d56746e2" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2") pod "e2e-test-statefulset-2-0" (UID: "51755125-cf24-4164-814c-ff779da0505c") : rpc error: code = InvalidArgument desc = volume pvc-fb423261-1200-48a9-8077-e177d56746e2 hasn't been attached yet
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: E0318 12:37:45.734181    1352 csi_attacher.go:364] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = InvalidArgument desc = volume pvc-fb423261-1200-48a9-8077-e177d56746e2 hasn't been attached yet
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: I0318 12:37:45.717154    1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-fb423261-1200-48a9-8077-e177d56746e2\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2\") pod \"e2e-test-statefulset-2-0\" (UID: \"51755125-cf24-4164-814c-ff779da0505c\") DevicePath \"csi-e3372656429ba5ddd2e47ad9bd23d10b758e716cac78ab1764808b39e824c032\"" pod="default/e2e-test-statefulset-2-0"
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: I0318 12:37:45.713199    1352 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-fb423261-1200-48a9-8077-e177d56746e2\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-fb423261-1200-48a9-8077-e177d56746e2\") pod \"e2e-test-statefulset-2-0\" (UID: \"51755125-cf24-4164-814c-ff779da0505c\") DevicePath \"\"" pod="default/e2e-test-statefulset-2-0"
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: E0318 12:37:45.534198    1352 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 podName: nodeName:}" failed. No retries permitted until 2024-03-18 12:39:47.534177625 +0000 UTC m=+742.444919394 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0") pod "e2e-test-deployment-2-7ddccb49f4-l47jl" (UID: "89d256e9-5f00-4ac5-a768-94c41fe9f365") : rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: E0318 12:37:45.534030    1352 csi_attacher.go:364] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = InvalidArgument desc = volume pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0 hasn't been attached yet
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: I0318 12:37:45.515715    1352 operation_generator.go:633] "MountVolume.WaitForAttach succeeded for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"csi-8c8c2b911395deaf705fc0a43968072e1ac253115a720b249e14f9c24e8755ed\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
Mar 18 12:37:45 ip-10-0-2-28 k3s[1352]: I0318 12:37:45.511729    1352 operation_generator.go:623] "MountVolume.WaitForAttach entering for volume \"pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\" (UniqueName: \"kubernetes.io/csi/driver.longhorn.io^pvc-9b4adf15-a26f-42f9-b62b-f0d6b501f8e0\") pod \"e2e-test-deployment-2-7ddccb49f4-l47jl\" (UID: \"89d256e9-5f00-4ac5-a768-94c41fe9f365\") DevicePath \"\"" pod="default/e2e-test-deployment-2-7ddccb49f4-l47jl"
  1. Hit container not found error at 12:42:25.
ApiException: (0)
Reason: Handshake status 500 Internal Server Error -+-+- {'content-length': '29', 'content-type': 'text/plain; charset=utf-8', 'date': 'Mon, 18 Mar 2024 12:42:25 GMT'} -+-+- b'container not found ("sleep")'

@yangchiu , any idea? Do we hit this often?

@yangchiu
Copy link
Member Author

Do we hit this often?

Never seen this in previous release testing phases. Since it's the first release testing after the refactoring, it needs more time to figure out the reproducibility.

@yangchiu
Copy link
Member Author

Another type of failure:

Got /data/random-data checksum = d41d8cd98f00b204e9800998ecf8427e

Expected checksum = dd: can't open '/data/random-data': Input/output error
d41d8cd98f00b204e9800998ecf8427e

https://ci.longhorn.io/job/private/job/longhorn-e2e-test/430/

Need to check whether it's a real issue or test case defect.

@c3y1huang
Copy link
Contributor

c3y1huang commented Mar 19, 2024

Another type of failure:

Got /data/random-data checksum = d41d8cd98f00b204e9800998ecf8427e

Expected checksum = dd: can't open '/data/random-data': Input/output error
d41d8cd98f00b204e9800998ecf8427e

https://ci.longhorn.io/job/private/job/longhorn-e2e-test/430/

Need to check whether it's a real issue or test case defect.

Could you help to create a separate issue for this because this is a different kind of failure. Combining newly discovered failures back to the original one makes the issue very hard to weigh its complexity. For this issue, I will just fix the looping issue as in the description. Thank you.

@c3y1huang c3y1huang changed the title [TEST] Flaky test case Reboot Node One By One While Workload Heavy Writing [TEST] Negative test case gets stuck in waiting for a non-exisitng pod stable but the pod has been running Mar 19, 2024
@yangchiu
Copy link
Member Author

Could you help to create a separate issue for this

Created #8208 for it. Thank you!

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Mar 19, 2024

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at: issue description

  • Is there a workaround for the issue? If so, where is it documented?
    The workaround is at:

  • Does the PR include the explanation for the fix or the feature?

  • Does the PR include deployment change (YAML/Chart)? If so, where are the PRs for both YAML file and Chart?
    The PR for the YAML change is at:
    The PR for the chart change is at:

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

  • Which areas/issues this PR might have potential impacts on?
    Area negative testing
    Issues

  • If labeled: require/LEP Has the Longhorn Enhancement Proposal PR submitted?
    The LEP PR is at

  • If labeled: area/ui Has the UI issue filed or ready to be merged (including backport-needed/*)?
    The UI issue/PR is at

  • If labeled: require/doc Has the necessary document PR submitted or merged (including backport-needed/*)?
    The documentation issue/PR is at

  • If labeled: require/automation-e2e Has the end-to-end test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue (including backport-needed/*)
    The automation skeleton PR is at
    The automation test case PR is at fix(negative): wait_for_workload_pods_stable longhorn-tests#1821
    The issue of automation test case implementation is at (please create by the template)

  • If labeled: require/automation-engine Has the engine integration test been merged (including backport-needed/*)?
    The engine automation PR is at

  • If labeled: require/manual-test-plan Has the manual test plan been documented?
    The updated manual test plan is at

  • If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility?
    The compatibility issue is filed at

@c3y1huang
Copy link
Contributor

Closing issue because PR merged.

@github-project-automation github-project-automation bot moved this from To do to Done in QA Sprint Mar 19, 2024
@derekbit derekbit moved this to Closed in Longhorn Sprint Aug 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/test Request for adding test
Projects
Status: Closed
Status: Closed
Development

No branches or pull requests

3 participants