Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI Backup fails upon provisioner and csi driver mismatch (everest-csi) #8277

Open
lestich opened this issue Oct 8, 2024 · 4 comments
Open
Labels
Area/CSI Related to Container Storage Interface support Needs triage We need discussion to understand problem and decide the priority

Comments

@lestich
Copy link

lestich commented Oct 8, 2024

What steps did you take and what happened:
I tried to create a velero CSI backup on my cluster, running on OpenTelekomCloud using everest-csi as storage provider, but it didn't work. VolumeSnapshots are supported and work as expected when i create them via VolumeSnapshot CR. Velero cannot handle the mismatch between StorageClass provisioner and driver in VolumeSnapshotClass ( see down below ). I tried to set the VolumeSnapshotClass explicit via PVC annotation but it fails upon this check:

https://github.com/vmware-tanzu/velero/blob/release-1.14/pkg/util/csi/volume_snapshot.go#L353

Logs:

"Didn't find VolumeSnapshotClass from PVC annotations: Incorrect VolumeSnapshotClass csi-disk-snapclass is not for driver everest-csi-provisioner" backup=velero/test-annotation3 cmd=/velero logSource="pkg/util/csi/volume_snapshot.go:313" pluginName=velero

"Error backing up item" backup=velero/test-annotation3 error="error executing custom action (groupResource=persistentvolumeclaims, namespace=default, name=hello-world-pvc): rpc error: code = Unknown desc = failed to get VolumeSnapshotClass for StorageClass csi-disk-topology-default: error getting VolumeSnapshotClass: failed to get VolumeSnapshotClass for provisioner everest-csi-provisioner, \n\t\tensure that the desired VolumeSnapshot class has the velero.io/csi-volumesnapshot-class label

What did you expect to happen:
Setting the VSC Annotation via the PVC should lead to a functional backup (by ignoring the mismatch between provisioner and driver)

Anything else you would like to add:
storageclass used:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: csi-disk-topology-default
parameters:
  csi.storage.k8s.io/csi-driver-name: disk.csi.everest.io
  csi.storage.k8s.io/fstype: ext4
  everest.io/disk-volume-type: SAS
  everest.io/passthrough: "true"
provisioner: everest-csi-provisioner
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

volumesnapshotclass used:

apiVersion: snapshot.storage.k8s.io/v1
deletionPolicy: Delete
driver: disk.csi.everest.io
kind: VolumeSnapshotClass
metadata:
  annotations:
    snapshot.storage.kubernetes.io/is-default-class: "true"
  labels:
    velero.io/csi-volumesnapshot-class: "true"
  name: otc-vsc

backup

apiVersion: velero.io/v1
kind: Backup
metadata:
  annotations:
    velero.io/resource-timeout: 10m0s
    velero.io/source-cluster-k8s-gitversion: v1.29.2-r0-29.0.11.8
    velero.io/source-cluster-k8s-major-version: "1"
    velero.io/source-cluster-k8s-minor-version: 29+
  labels:
    velero.io/storage-location: default
  name: test-annotation3
  namespace: velero
spec:
  csiSnapshotTimeout: 10m0s
  defaultVolumesToFsBackup: false
  hooks: {}
  includedNamespaces:
  - default
  includedResources:
  - pv
  - pvc
  itemOperationTimeout: 4h0m0s
  metadata: {}
  snapshotMoveData: true
  storageLocation: default
  ttl: 720h0m0s

VolumeSnapshots work as expected with this (strange) configuration. It is preprovisioned by the cloud-provider / everest-csi addon (beside the annotations for default-class and velero label).

Environment:

  • Velero version (use velero version): v.1.14.1
  • velero/velero-plugin-for-aws:v1.10.0 (talking to otc s3)
  • Velero features (use velero client config get features): EnableCSI
  • Kubernetes version (use kubectl version): 1.29

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@msfrucht
Copy link
Contributor

msfrucht commented Oct 8, 2024

Applications that automatically choose a VolumeSnapshotClass for snapshots are supposed to use StorageClass provisioner field to match or at least contain the same value in VolumeSnapshotClass driver field. And this is shown in the everest snapshot examples as well.

https://github.com/huaweicloud/huaweicloud-csi-driver/tree/master/examples/evs-csi-plugin/kubernetes/snapshot

Change these fields to match and the issue will be resolved.

@ywk253100 ywk253100 added the Area/CSI Related to Container Storage Interface support label Oct 9, 2024
@ywk253100
Copy link
Contributor

ywk253100 commented Oct 9, 2024

@lestich Is it possible to modifier the StorageClass and VolumeSnapshotClass to make them match as @msfrucht suggested to work around the issue?

@ywk253100 ywk253100 added the Needs triage We need discussion to understand problem and decide the priority label Oct 9, 2024
@lestich
Copy link
Author

lestich commented Oct 9, 2024

Thanks for the responses @ywk253100 @msfrucht .

I already tried the solution suggested by @msfrucht yesterday, unforunately without success.

If i change the provisioner in the StorageClass CR to disk.csi.everest.io it cannot provision PVs anymore (the csi controller seems to not reconcile them, there is an Status Event that the Volume has to be provisioned 'by hand') and if i change the driver in the VolumeSnapshotClass to 'everest-csi-provisioner', it cannot create Snapshots anymore (in this case velero doesn't fail early but times out).

I know that this is a very strange setup or anti-pattern. Provisioner and driver should match. But in this case (and maybe in other setups) we have no influence on the fact that provisioner and driver are not the same. Wouldn't it make sense to be able to set an annotation that ignores the check (as specified above) for such cases? Or why does the check exist at all at the point where you explicitly specify a VolumeSnapshotClass via annotation?

@msfrucht
Copy link
Contributor

msfrucht commented Oct 9, 2024

That is odd. A CSI driver has to register a provisioner string. That implies that when the CSI driver was installed the provisioner string was set to everest-csi-provisioner instead of the default "evs.csi.huaweicloud.com".

It should be far safer to adjust the VolumeSnapshotClass driver field than the StorageClass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/CSI Related to Container Storage Interface support Needs triage We need discussion to understand problem and decide the priority
Projects
None yet
Development

No branches or pull requests

3 participants