Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal: unable to open config file results in PartiallyFailed Backup #8263

Open
amrap030 opened this issue Oct 4, 2024 · 3 comments
Open
Labels
area/fs-backup Restic Relates to the restic integration

Comments

@amrap030
Copy link

amrap030 commented Oct 4, 2024

What steps did you take and what happened:

Unfortunately my backups end up being PartiallyFailed due to the following error:

Errors:
  Velero:   message: /pod volume backup failed: data path backup failed: error running restic backup command restic backup --repo=s3:https://***.net/<bucketname>/velero/restic/kube-system --password-file=/tmp/credentials/velero/velero-repo-credentials-repository-password --cache-dir=/scratch/.cache/restic . --tag=pod-uid=1927b692-dda3-4994-b047-335921d6dc2c --tag=volume=socket-dir --tag=backup=velero-daily-20241004171637 --tag=backup-uid=4afba32a-2995-48e1-bd80-cc811de09aeb --tag=ns=kube-system --tag=pod=openstack-cinder-csi-controllerplugin-7f8cf7f5cb-r8ppl --host=velero --json with error: exit status 1 stderr: Fatal: unable to open config file: Stat: Get "https://***.net/<bucketname>/?location=": dial tcp: lookup ***.net: i/o timeout
Is there a repository at the following location?
s3:https://***.net/<bucketname>/velero/restic/kube-system

However, when looking into my bucket with an S3 viewer, there is the repository /velero/restic/kube-system and it also contains the config file along with the snapshots etc.

I already tried setting various proxy settings, because I run this on-premise and the S3 bucket is an on-premise enterprise object storage, but without success. Since the backup files are uploaded to the S3 buckets just fine, I assume the proxy settings are not relevant. I also tried to install restic on my local machine and tried to verify the repository via restic -r s3:https://***.net/<bucketname>/velero/restic/kube-system snapshots which works just fine.

Additionally, I am using the velero/velero-plugin-for-aws:v1.9.0 plugin, as it is an S3 compatible storage.

Since I am running everything in our on-premise environment, I don't really want to add the debug information bundle as it might contain sensitive data.

What did you expect to happen:

I expect that the backup executes just fine without being PartiallyFailed.

The following information will help us better understand what's going on:

If you are using velero v1.7.0+:
Please use velero debug --backup <backupname> --restore <restorename> to generate the support bundle, and attach to this issue, more options please refer to velero debug --help

If you are using earlier versions:
Please provide the output of the following commands (Pasting long output into a GitHub gist or other pastebin is fine.)

  • kubectl logs deployment/velero -n velero
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
  • velero backup logs <backupname>
  • velero restore describe <restorename> or kubectl get restore/<restorename> -n velero -o yaml
  • velero restore logs <restorename>

Anything else you would like to add:

Environment:

  • Velero version (use velero version): 1.14.1
  • Velero features (use velero client config get features): n/a
  • Kubernetes version (use kubectl version): 1.31
  • Kubernetes installer & version: Juju Charms
  • Cloud provider or hardware configuration: Openstack
  • OS (e.g. from /etc/os-release): n/a

Vote on this issue!

This is an invitation to the Velero community to vote on issues, you can see the project's top voted issues listed here.
Use the "reaction smiley face" up to the right of this comment to vote.

  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@Lyndon-Li Lyndon-Li added area/fs-backup Restic Relates to the restic integration labels Oct 6, 2024
@Lyndon-Li
Copy link
Contributor

Could you try kopia path instead? restic path is being deprecated, so we are not going to work on restic path for troubleshooting or enhancements

@amrap030
Copy link
Author

amrap030 commented Oct 8, 2024

@Lyndon-Li yes I will try that and post my results

@amrap030
Copy link
Author

amrap030 commented Oct 9, 2024

@Lyndon-Li with kopia I am getting a similar error:

Errors:
  Velero:    message: /pod volume backup failed: error to initialize data path: error to boost backup repository connection default-kube-system-kopia: error to connect backup repo: error to connect to storage: error retrieving storage config from bucket "expcs3mbvd-uptime": Get "https://***.net/expcs3mbvd-uptime/velero/kopia/kube-system/.storageconfig": dial tcp: lookup ***.net: i/o timeout
  Cluster:    <none>
  Namespaces: <none>

Namespaces:
  Included:  *
  Excluded:  velero

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Or label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto
Snapshot Move Data:          false
Data Mover:                  velero

TTL:  168h0m0s

CSISnapshotTimeout:    10m0s
ItemOperationTimeout:  4h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2024-10-09 10:10:59 +0200 CEST
Completed:  2024-10-09 10:16:19 +0200 CEST

Expiration:  2024-10-16 10:10:59 +0200 CEST

Total items to be backed up:  963
Items backed up:              963

Backup Volumes:
  Velero-Native Snapshots: <none included>

  CSI Snapshots: <none included>

  Pod Volume Backups - kopia:
    Completed:
      argocd/argocd-application-controller-0: argocd-home
      argocd/argocd-applicationset-controller-57f56b4dd5-q4j5f: gpg-keyring, tmp
      argocd/argocd-dex-server-65db84595d-8btc8: dexconfig, static-files
      argocd/argocd-server-6587765cbb-qxdk9: plugins-home, tmp
      kube-system/metrics-server-v0.7.1-685874c7b8-vx464: tmp-dir
      monitoring/kube-prometheus-stack-grafana-79d64d6566-v9kp5: sc-dashboard-volume, sc-datasources-volume, sc-plugins-volume, storage
      monitoring/prometheus-kube-prometheus-stack-prometheus-0: config-out, prometheus-kube-prometheus-stack-prometheus-db
      monitoring/uptime-kuma-67bdd4dd-bkwkv: storage
      networking/traefik-7bb494677b-nf74h: data, tmp
      trivy-system/trivy-operator-6758798dc6-mn4kv: cache-policies
    Failed:
      kube-system/openstack-cinder-csi-controllerplugin-7f8cf7f5cb-r8ppl: socket-dir
HooksAttempted:  0
HooksFailed:     0

Still not sure why the error occurs because the directory kube-system actually exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/fs-backup Restic Relates to the restic integration
Projects
None yet
Development

No branches or pull requests

2 participants