Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MountDevice failed log spamming for statefulset volumes after RKE version upgrade #3203

Open
bandak2 opened this issue Mar 16, 2023 · 25 comments

Comments

@bandak2
Copy link

bandak2 commented Mar 16, 2023

RKE version
v1.3.18

Docker version
20.10.17-ce

Operating system and kernel
SUSE Linux Enterprise Server 15 SP4 -- Kernel 5.14.21-150400.22-default

Type/provider of hosts
VMware VM

cluster.yml file:

nodes:
- address: worker1
  port: "22"
  internal_address: ""
  hostname_override: ""
  user: "root"
  ssh_key: ""
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
  ssh_key_path: ~/.ssh/id_rsa
  docker_socket: /var/run/docker.sock
  role: [worker]
- address: master
  port: "22"
  internal_address: ""
  hostname_override: ""
  user: "root"
  ssh_key: ""
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
  ssh_key_path: ~/.ssh/id_rsa
  docker_socket: /var/run/docker.sock
  role: [controlplane, etcd]
- address: worker2
  port: "22"
  internal_address: ""
  hostname_override: ""
  user: "root"
  ssh_key: ""
  ssh_cert: ""
  ssh_cert_path: ""
  labels: {}
  taints: []
  ssh_key_path: ~/.ssh/id_rsa
  docker_socket: /var/run/docker.sock
  role: [worker]
services:
  etcd:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    external_urls: []
    ca_cert: ""
    cert: ""
    key: ""
    path: ""
    uid: 0
    gid: 0
    snapshot: true
    retention: 24h
    creation: 12h
    backup_config:
      interval_hours: 12
      retention: 6
  kube-api:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    service_cluster_ip_range: 10.43.0.0/16
    service_node_port_range: ""
    pod_security_policy: false
    always_pull_images: false
    secrets_encryption_config: null
    audit_log: null
    admission_configuration: null
    event_rate_limit: null
  kube-controller:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    cluster_cidr: 10.42.0.0/16
    service_cluster_ip_range: 10.43.0.0/16
  scheduler:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
  kubelet:
    image: ""
    extra_args:
      max-pods: 150
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
    cluster_domain: cluster.local
    infra_container_image: ""
    cluster_dns_server: 10.43.0.10
    fail_swap_on: false
    generate_serving_certificate: false
  kubeproxy:
    image: ""
    extra_args: {}
    extra_binds: []
    extra_env: []
    win_extra_args: {}
    win_extra_binds: []
    win_extra_env: []
network:
  plugin: "calico"
  options:
    canal_flannel_backend_type: vxlan
  mtu: 0
  node_selector: {}
  update_strategy: null
  tolerations: []
authentication:
  strategy: x509
  sans: []
  webhook: null
addons: ""
addons_include: []
system_images:
  etcd: ""
  alpine: ""
  nginx_proxy: ""
  cert_downloader: ""
  kubernetes_services_sidecar: ""
  kubedns: ""
  dnsmasq: ""
  kubedns_sidecar: ""
  kubedns_autoscaler: ""
  coredns: ""
  coredns_autoscaler: ""
  nodelocal: ""
  kubernetes: ""
  flannel: ""
  flannel_cni: ""
  calico_node: ""
  calico_cni: ""
  calico_controllers: ""
  calico_ctl: ""
  calico_flexvol: ""
  canal_node: ""
  canal_cni: ""
  canal_controllers: ""
  canal_flannel: ""
  canal_flexvol: ""
  weave_node: ""
  weave_cni: ""
  pod_infra_container: ""
  ingress: ""
  ingress_backend: ""
  ingress_webhook: ""
  metrics_server: ""
  windows_pod_infra_container: ""
  aci_cni_deploy_container: ""
  aci_host_container: ""
  aci_opflex_container: ""
  aci_mcast_container: ""
  aci_ovs_container: ""
  aci_controller_container: ""
  aci_gbp_server_container: ""
  aci_opflex_server_container: ""
ssh_key_path: ""
ssh_cert_path: ""
ssh_agent_auth: false
authorization:
  mode: rbac
  options: {}
ignore_docker_version: false
enable_cri_dockerd: null
kubernetes_version: v1.24.9-rancher1-1
private_registries: []
ingress:
  provider: nginx
  options:
    use-forwarded-headers: "true"
  node_selector: {}
  extra_args: {}
  dns_policy: ""
  extra_envs: []
  extra_volumes: []
  extra_volume_mounts: []
  update_strategy: null
  http_port: 0
  https_port: 0
  network_mode: ""
  tolerations: []
  default_backend: null
  default_http_backend_priority_class_name: ""
  nginx_ingress_controller_priority_class_name: ""
  default_ingress_class: null
cluster_name: rke
cloud_provider:
  name: ""
prefix_path: ""
win_prefix_path: ""
addon_job_timeout: 0
bastion_host:
  address: bastion
  port: ""
  user: "root"
  ssh_key: ""
  ssh_key_path: "~/.ssh/id_rsa"
  ssh_cert: ""
  ssh_cert_path: ""
  ignore_proxy_env_vars: false
monitoring:
  provider: ""
  options: {}
  node_selector: {}
  update_strategy: null
  replicas: null
  tolerations: []
  metrics_server_priority_class_name: ""
restore:
  restore: false
  snapshot_name: ""
rotate_encryption_key: false
dns: null

Steps to Reproduce:

  • While on k8s 1.22.17, and with stateful running with volume mounts, upgrade the rke kubernetes version from 1.22.17 to 1.24.9.
  • After the upgrade, the stateful pods, and respective volumes from pv are mounted as expected.
  • However, in the event logs it is observed that the events will be populated with multiple warnings and errors, and that the volume is unmounted and mounted elsewhere. The spamming of the events is hight, ~3800 such events in 4d observation period.

Results:

  "log": "I0316 12:14:58.508866   22179 operation_generator.go:626] \"MountVolume.WaitForAttach succeeded for volume \\\"csivol-d8ed8f41a7\\\" (UniqueName: \\\"kubernetes.io/csi/csi-unity.dellemc.com^csivol-d8ed8f41a7-iSCSI-storagearray-sv_268814\\\") pod \\\"web-0\\\" (UID: \\\"98a7514b-414b-4ddf-ab23-572e606c7244\\\") DevicePath \\\"csi-240ae18dc9f334d896c39711cbf58fd23c51db537b8bd6bcaffc35edf25abe1f\\\"\" pod=\"default/web-0\"\n",
  "stream": "stderr",
  "time": "2023-03-16T12:14:58.513491344Z"
}
{
  "log": "I0316 12:14:59.411402   22179 kubelet.go:2182] \"SyncLoop (probe)\" probe=\"readiness\" status=\"\" pod=\"kube-system/coredns-autoscaler-79dcc864f5-pmwrf\"\n",
  "stream": "stderr",
  "time": "2023-03-16T12:14:59.412760158Z"
}
{
  "log": "I0316 12:14:59.412556   22179 kubelet.go:2182] \"SyncLoop (probe)\" probe=\"readiness\" status=\"ready\" pod=\"kube-system/coredns-autoscaler-79dcc864f5-pmwrf\"\n",
  "stream": "stderr",
  "time": "2023-03-16T12:14:59.412787709Z"
}
{
  "log": "I0316 12:15:00.152054   22179 kubelet.go:2182] \"SyncLoop (probe)\" probe=\"readiness\" status=\"\" pod=\"kube-system/calico-node-vfc9r\"\n",
  "stream": "stderr",
  "time": "2023-03-16T12:15:00.15371283Z"
}
{
  "log": "E0316 12:15:00.286052   22179 csi_attacher.go:344] kubernetes.io/csi: attacher.MountDevice failed: rpc error: code = Internal desc =  runid=34 device already in use and mounted elsewhere. Cannot do private mount\n",
  "stream": "stderr",
  "time": "2023-03-16T12:15:00.286173505Z"
}
{
  "log": "E0316 12:15:00.286292   22179 nestedpendingoperations.go:348] Operation for \"{volumeName:kubernetes.io/csi/csi-unity.dellemc.com^csivol-d8ed8f41a7-iSCSI-storagearray-sv_268814 podName: nodeName:}\" failed. No retries permitted until 2023-03-16 12:15:01.286274144 +0000 UTC m=+15.145119628 (durationBeforeRetry 1s). Error: MountVolume.MountDevice failed for volume \"csivol-d8ed8f41a7\" (UniqueName: \"kubernetes.io/csi/csi-unity.dellemc.com^csivol-d8ed8f41a7-iSCSI-storagearray-sv_268814\") pod \"web-0\" (UID: \"98a7514b-414b-4ddf-ab23-572e606c7244\") : rpc error: code = Internal desc =  runid=34 device already in use and mounted elsewhere. Cannot do private mount\n",
  "stream": "stderr",
  "time": "2023-03-16T12:15:00.28633199Z"
}

SURE-6124

@bandak2
Copy link
Author

bandak2 commented Mar 17, 2023

Can someone please help us with taking a look at this?
Our driver uses kubelet path /var/lib/kubelet
update was done using command rke up --config ./cluster.yaml after changing kubernetes_version
If any additional info is required, please let us know.

@bandak2
Copy link
Author

bandak2 commented Mar 23, 2023

Hi, can someone please help with the issue?

@bandak2
Copy link
Author

bandak2 commented Apr 3, 2023

This is in relation to our Dell CSI storage driver, where an RKE upgrade is leading kubelet spamming mount error for statefulsets.
Our CSI driver is in sync with the CSI spec and we would like to understand what during the upgrade process is making the mount calls for already existing mounts. This is leading to Failed mount error spamming in the kubelet logs.
If you need additional info, please let us know.

@Tejeev
Copy link

Tejeev commented Apr 4, 2023

Hi @bandak2,
I'm working on the internal ticket for this. Could you please attach information on the csi version and any configuration details that might help us reproduce?

Thank you,
TJ

@bandak2
Copy link
Author

bandak2 commented Apr 5, 2023

Hi TJ,

The CSI spec our Dell csi-unity storage driver uses is CSI Spec 1.5.
The behavior can be reproduced after we upgrade the RKE k8s version from 1.22.17 to 1.24.9; with our storage drivers installed with statefulsets running with dynamic provisioning (dynamic PVC provisioning).
We were running RKE 1.3.18 version for these upgrades.

  • While on RKE 1.3.18 (running k8s 1.22.17), we had our storage driver installed (Dell CSI-unity version 2.4.0).
  • Created statefulset (sample can be found below).
  • This stateful set creates dynamic PVC and binds them to our volumes created by the drivers.
  • Everything is normal at this point.
  • upgrade RKE k8s to 1.24.9 using rke commands.
  • After the upgrade is successful, we see that the stateful pod and mounts are proper and as expected.
  • However, we are seeing that there are continuous errors reported for the mounts (masked the storage arrayID in the below output)
    Error: MountVolume.MountDevice failed for volume \"csivol-d8ed8f41a7\" (UniqueName: \"kubernetes.io/csi/csi-unity.dellemc.com^csivol-d8ed8f41a7-iSCSI-storagearray-sv_268814\") pod \"web-0\" (UID: \"98a7514b-414b-4ddf-ab23-572e606c7244\") : rpc error: code = Internal desc = runid=34 device already in use and mounted elsewhere. Cannot do private mount\n",

sample statefulset used:

`---
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:

  • port: 80
    name: web
    clusterIP: None
    selector:
    app: nginx

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
labels:
app: nginx
spec:
serviceName: "nginx"
selector:
matchLabels:
app: nginx
replicas: 2
minReadySeconds: 10
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: registry.k8s.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:

  • metadata:
    name: www
    spec:
    accessModes: [ "ReadWriteOnce" ]
    resources:
    requests:
    storage: 10Gi
    storageClassName: unity-iscsi
    `

@keerthi652
Copy link

Hi TJ, Do you have an update on this, or if you need anything from us?

@Tejeev
Copy link

Tejeev commented Apr 27, 2023

Hi Keerthi
My apologies for the lapse in updates. Our engineer is still reviewing the issue and working to reproduce it. Just so you know a little about what we're doing, he's checking to see if this can be reproduced in environments with different situations/versions (such as 1.22->1.24 upgraded environs, directly in 1.24.9, and even in a newer version of 1.24 which we are preparing for release). Hopefully, this will turn up some information on what might be different on our end or if there's some common denominator to the failure states.

I'll see if there is any information we can provide him to speed up this part of the process.
If anything comes up that you may think is relevant, please don't hesitate to let us know.

@Tejeev
Copy link

Tejeev commented Apr 27, 2023

I've been reviewing things, and it may help improve the productivity of our recreations if we could double check the formatting of the statefulset you provided so as to prevent any misunderstandings on our part from propagating through our tests. Please review my edits below for accuracy:

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  - ports:
    port: 80
    name: web
    clusterIP: None
selector:
  app: nginx

---

apiVersion: apps/v1
  kind: StatefulSet
metadata:
  name: web
labels:
  app: nginx
spec:
  serviceName: "nginx"
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  minReadySeconds: 10
  template:
    metadata:
      labels:
        app: nginx
      spec:
        containers:
        - name: nginx
          image: registry.k8s.io/nginx-slim:0.8
          ports:
          - containerPort: 80
            name: web
          volumeMounts:
          - name: www
            mountPath: /usr/share/nginx/html
            volumeClaimTemplates:
          - metadata:
            name: www
            spec:
              accessModes: [ "ReadWriteOnce" ]
              resources:
                requests:
                  storage: 10Gi
                storageClassName: unity-iscsi

(Took me a sec to figure out but it looks like the code block syntax for github is three back ticks (```))

@bandak2
Copy link
Author

bandak2 commented Apr 27, 2023

Thanks for the response @Tejeev .
We were using the volumeClaimTemplates in the statefulset manifest under the templates section for dynamic PVC generation for the statefulset pods. This generates us a PVC using our csi-unity driver which will be used for the volume mount. I think either way would work for the test.

  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi
      storageClassName: unity-iscsi

@Tejeev
Copy link

Tejeev commented Apr 28, 2023

Is this correct?

apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  - ports:
    port: 80
    name: web
    clusterIP: None
selector:
  app: nginx

---

apiVersion: apps/v1
  kind: StatefulSet
metadata:
  name: web
labels:
  app: nginx
spec:
  serviceName: "nginx"
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  minReadySeconds: 10
  template:
    metadata:
      labels:
        app: nginx
      spec:
        containers:
        - name: nginx
          image: registry.k8s.io/nginx-slim:0.8
          ports:
          - containerPort: 80
            name: web
          volumeMounts:
          - name: www
            mountPath: /usr/share/nginx/html
    volumeClaimTemplates:
    - metadata:
        name: www
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 10Gi
        storageClassName: unity-iscsi

@bandak2
Copy link
Author

bandak2 commented Apr 28, 2023

seems like the identation is a little off for the claimtempalte. You can use the following which works for me:

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  ports:
  - port: 80
    name: web
  clusterIP: None
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: web
  labels:
    app: nginx
spec:
  serviceName: "nginx"
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  minReadySeconds: 10
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: registry.k8s.io/nginx-slim:0.8
        ports:
        - containerPort: 80
          name: web
        volumeMounts:
        - name: www
          mountPath: /usr/share/nginx/html
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi
      storageClassName: unity-iscsi

@Tejeev
Copy link

Tejeev commented Apr 28, 2023

Thanks!

@Tejeev
Copy link

Tejeev commented Apr 28, 2023

I'm told we're having some difficulty securing access to the specific server hardware (and, therefore also the CSI driver) that is being used here. Can you please help us investigate by trying the following:

  1. See if the same issue occurs on a different distribution of the same version of Kubernetes. So, for example, try the same setup with KubeAdm instead of RKE1.
  2. See if the same issue occurs after a fresh installation of K8s v1.24, or if it only happens on upgrade from v1.22 to v1.24.

The key here is to try the above steps on the same hardware with the same CSI drivers, as I think trying it on different hardware with different drivers wouldn't produce useful results for this particular issue.

@bandak2
Copy link
Author

bandak2 commented May 3, 2023

Apologies for the delayed response. It was a long weekend.
Response to the points you raised:

  1. for regular deployments, our configuration is based on kubeadm. We can give the scenario a try on them.
  2. It only happens after an upgrade, where the remount triggers. we are finding that the STS mount is checked, and returns success. We've dug into the logs and found that the mount gets retried with a different path once the mount returns success.

We'll revert on point 1.

@bandak2
Copy link
Author

bandak2 commented May 17, 2023

We tried the upgrade scenario with rke v1.3.20 from 1.22 through 1.24, and we've hit the issue once we landed in 1.24.
Seems like the staging path for the node stage volume is different for 1.24 when compared through k8s 1.22 and k8s 1.23 when tested with the same version of our driver.
Here are the findings

RKE v1.3.20 with k8s v1.22.17-rancher1-2
driver v2.4.0

NODE_STAGE_VOLUME path is different here
StagingTargetPath=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/csivol-50d0f5ffdd/globalmount

upgrading to v1.23.16-rancher2-2.....
NODE_STAGE_VOLUME path remained the same

StagingTargetPath:/var/lib/kubelet/plugins/kubernetes.io/csi/pv/csivol-50d0f5ffdd/globalmount
...
...
time="2023-05-17T13:17:22Z" level=debug arrayid=apm00213404195 runid=25 msg="volume already published to target" func="github.com/dell/csi-unity/service.publishVolume()" file="/go/src/csi-unity/service/mount.go:364"

..upgrading from v1.23.16-rancher2-2 to v1.24.13-rancher2-1

StagingTargetPath=/var/lib/kubelet/plugins/kubernetes.io/csi/csi-unity.dellemc.com/faf27cb97b007ad80dd62a9ffe53c84c03f9ded373e527a41654ae32a31c14c7/globalmount

time="2023-05-17T13:35:41Z" level=info msg="/csi.v1.Node/NodeStageVolume: REP 0073: rpc error: code = Internal desc =  runid=73 device already in use and mounted elsewhere. Cannot do private mount"

We are not sure why there is a change in the staging path when we hit k8s 1.24 version.

@bfbachmann
Copy link
Contributor

Hi @bandak2

I've been looking into this issue. As @Tejeev mentioned, it's difficult for us to reproduce because we don't have the Dell hardware required. From going through the code, I understand that this error is happening when NodeStageVolume is called on your CSI (unity) driver. I believe it's kubelet that is making this call and passing an unexpected value as the StagingTargetPath. Would you be able to check the kubelet logs to see what it's doing differently in 1.24 vs 1.22/1.23?

@bandak2
Copy link
Author

bandak2 commented May 23, 2023

We're trying this on vanilla k8s and will update our findings from our end.

@snasovich
Copy link
Collaborator

@bandak2 , just curious if there is any update on it?

@bandak2
Copy link
Author

bandak2 commented Jun 5, 2023

Hi @snasovich, Thanks for checking.
Yes, we've tried this on our vanilla k8s (one deployed using kubeadm) from 1.22 -> 1.23 -> 1.24 with our driver installed, and it looks like the issue is not reproducible on vanilla k8s. The orchestrator seems to be calling the NodeUnstageVolume correctly after the upgrade to 1.24, and then remounting it with the new updated path correctly using NodeStageVolume calls.
I'll be sharing the logs shortly today after sanitizing them.
Just a note, during k8s upgrade the kubeadm way, we cordon and drain the nodes. we follow the procedure mentioned in the Kubernetes guide to upgrade one version at a time:
https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

@bandak2
Copy link
Author

bandak2 commented Jun 6, 2023

There are too many logs to sanitize given that we are including the driver logs. We're reaching out to your team via a different channel where we can share those securely, without having to go through sanitizing all of these.

@bandak2
Copy link
Author

bandak2 commented Jun 14, 2023

The logs for this have been provided through backend channels. Once you have some insights please let us know.

@bfbachmann
Copy link
Contributor

bfbachmann commented Jun 20, 2023

Hi @bandak2. Thanks for the logs. I've spent some time digging through them and the only meaningful thing I've found so far is (as you mentioned before), the new mount calls to a different path.

Before v1.24, all the mounts are to the mount point /var/lib/kubelet/plugins/kubernetes.io/csi/pv/krv11506-d05ec878ad/globalmount.
E.g:

mount -t ext4 -o defaults /dev/dm-0 /var/lib/kubelet/plugins/kubernetes.io/csi/pv/krv11506-d05ec878ad/globalmount

After v1.24, I see that there are still mount calls that use the same mount point, but there are also new ones that use /var/lib/kubelet/plugins/kubernetes.io/csi/csi-unity.dellemc.com/4bb10b984d5ca1bf16a8c79733866361f7061d665cba703eef3307a3426d3b21/globalmount.
E.g:

mount -t ext4 -o defaults /dev/dm-1 /var/lib/kubelet/plugins/kubernetes.io/csi/csi-unity.dellemc.com/4bb10b984d5ca1bf16a8c79733866361f7061d665cba703eef3307a3426d3b21/globalmount

I'm not sure if this is a red herring or not, though, because these mounts all seem to succeed eventually. I did however notice that multipathd seems to be running on your nodes and I'm wondering if you might experiencing an issue similar to this one. Is this something you'd be able to check?

@bandak2
Copy link
Author

bandak2 commented Jun 21, 2023

Thanks @bfbachmann, I'll check on the link of the similar issue to see if there is anything that stands out from our end of things.
As for multipath, our multipath config is at /etc/multipath.conf on the node and would have the following contents for Kubernetes or RKE:

defaults {
  user_friendly_names yes
  find_multipaths yes
}

@bandak2
Copy link
Author

bandak2 commented Jun 21, 2023

While at it, as you can see in the logs, during the k8s upgrade to 1.24, the older path was unpublished and then re-published with the new path. I believe the CO does that. These actions don't seem to happen while upgrading RKE, and the older path stays as the StagingPath.
I'll check to see if there are any hints from the multipath aspect.

@bfbachmann
Copy link
Contributor

Hi @bandak2, did you find anything interesting on the multipath front?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants