Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed backup shows as completed when failure to read a volume occurs #1032

Open
shreddedbacon opened this issue Dec 9, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@shreddedbacon
Copy link

shreddedbacon commented Dec 9, 2024

Description

If a volume mounted to the backup pod is unreadable, k8up will report an error during scan that the volume is unreadable. This will then proceed to the next step to check for files which also fails. The result is an empty snapshot.

The problem is that this is determined as a successful backup, which IMO is wrong. If I've asked to back up a volume, and the entire volume is determined as unreadable, then this is a failure.

Additional Context

We discovered this when we were wondering why a volume snapshot was empty when we had received no backup failure alerts uselagoon/build-deploy-tool#361

The permission on the volume meant the backup pod user was unable to access it at all.

I know you've mentioned that k8up_backup_restic_last_errors contains some information on files failed etc. But we're talking about the entire volume in this case.

When looking at the backup pod that is created, the user is 65532 and the permissions on the volume mean it is not accessible to this user, and this results in the backup scan failing.

bash-5.1$ id
uid=65532 gid=0(root) groups=0(root)
bash-5.1$ ls -alh /data/
total 28K    
drwxr-xr-x    3 root     root          19 Aug 22 21:44 .
drwxr-xr-x    1 root     root          54 Aug 22 21:44 ..
drwxrws---   12 10000    10001      30.0K Aug 21 21:49 nginx
bash-5.1$ ls -alh /data/nginx/
ls: can't open '/data/nginx/': Permission denied

No files even get backed up in this instance, but the backup is still classed as a "success". Both of the logs that show error, either of them should really result in a backup failure.

Logs

You can see the initial error here where the scan results in an error. And the subsequent archival process results in an error too.

1.7243634935042348e+09	ERROR	k8up.restic.restic.backup.progress	/data/nginx during scan 	{"error": "error occurred during backup"}
github.com/k8up-io/k8up/v2/restic/logging.(*BackupOutputParser).out
	/home/runner/work/k8up/k8up/restic/logging/logging.go:156
github.com/k8up-io/k8up/v2/restic/logging.writer.Write
	/home/runner/work/k8up/k8up/restic/logging/logging.go:103
io.copyBuffer
	/opt/hostedtoolcache/go/1.19.2/x64/src/io/io.go:429
io.Copy
	/opt/hostedtoolcache/go/1.19.2/x64/src/io/io.go:386
os/exec.(*Cmd).writerDescriptor.func1
	/opt/hostedtoolcache/go/1.19.2/x64/src/os/exec/exec.go:407
os/exec.(*Cmd).Start.func1
	/opt/hostedtoolcache/go/1.19.2/x64/src/os/exec/exec.go:544
1.724363493509051e+09	INFO	k8up.restic.restic.backup.progress	progress of backup	{"percentage": "0.00%"}
1.7243634938520162e+09	ERROR	k8up.restic.restic.backup.progress	/data/nginx during archival 	{"error": "error occurred during backup"}
github.com/k8up-io/k8up/v2/restic/logging.(*BackupOutputParser).out
	/home/runner/work/k8up/k8up/restic/logging/logging.go:156
github.com/k8up-io/k8up/v2/restic/logging.writer.Write
	/home/runner/work/k8up/k8up/restic/logging/logging.go:103
io.copyBuffer
	/opt/hostedtoolcache/go/1.19.2/x64/src/io/io.go:429
io.Copy
	/opt/hostedtoolcache/go/1.19.2/x64/src/io/io.go:386
os/exec.(*Cmd).writerDescriptor.func1
	/opt/hostedtoolcache/go/1.19.2/x64/src/os/exec/exec.go:407
os/exec.(*Cmd).Start.func1
	/opt/hostedtoolcache/go/1.19.2/x64/src/os/exec/exec.go:544

Expected Behavior

If the volume is unreadable, I would expect the backup to fail. Even if other parts of the backup succeed.

Steps To Reproduce

uselagoon/build-deploy-tool#361

Version of K8up

v2.5.2

Version of Kubernetes

v1.31.0

Distribution of Kubernetes

EKS, GCP, AKS

@shreddedbacon shreddedbacon added the bug Something isn't working label Dec 9, 2024
@shreddedbacon
Copy link
Author

I realise that adding a podSecurityContext or podConfig that allows us to change the user to one that is valid will fix this. I still think the behaviour of a volume that fails to scan and archive should probably be classed as a failure, than a success.

@Kidswiss
Copy link
Contributor

Hi @shreddedbacon

Thanks for this new issue. Maybe a bit of a background why it currently happens this way:

Restic (the tool we use underneath K8up) will continue to try to backup, even if it runs into a "permission denied" or other error. Restic will then track these errors internally and provide a count of such errors at the end of the run. Restic will then exit with an exit code of 3, which states that the backup might be incomplete, due to not being able to read all files.

How we currently handle this in K8up is that we treat exit code 3 as successful, but we expose the k8up_backup_restic_last_errors metric, so it can be determined via Prometheus if the backup should be considered successful or not.

Having said that, there's room for improvement:

If Restic exits with code 3, K8up can catch that and set a special condition on the backup object. Something like "PartialBackupCompleted". So it will be more visible without the whole Prometheus setup.

@shreddedbacon
Copy link
Author

I still think there exists a condition where failing to read the entire directory should be classed as a failed backup.

Relying on the k8up_backup_restic_last_errors metric to catch a condition where the directory was unreadable would have been useless because there are often files within the volumes that are successful that are unreadable. How can you distinguish between the entire directory failing to be backed up, and a few missing files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants