Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lagoon schedules conflicting backup checks #325

Open
smlx opened this issue Jun 10, 2024 · 1 comment
Open

Lagoon schedules conflicting backup checks #325

smlx opened this issue Jun 10, 2024 · 1 comment

Comments

@smlx
Copy link
Member

smlx commented Jun 10, 2024

Lagoon has created backup schedules for two different environments of the same project. Here is the diff showing an identical schedule in two different namespaces:

 apiVersion: backup.appuio.ch/v1alpha1
 kind: Schedule
 metadata:
   name: k8up-lagoon-backup-schedule
-  namespace: foo-staging
+  namespace: foo-pr-1037
 spec:
   backend:
     repoPasswordSecretRef:
       key: repo-pw
       name: baas-repo-pw
     s3:
       bucket: baas-cluster-id0/baas-foo
   backup:
     resources: {}
     schedule: 23 1 * * *
   check:
     resources: {}
     schedule: 23 7 * * 1
   prune:
     resources: {}
     retention:
       keepDaily: 7
       keepMonthly: 1
       keepWeekly: 6
     schedule: 23 4 * * 0

These schedules cause two checks to run at the same time (one in each namespace). This sometimes works if the repository is small (so the check is quick) but often doesn't work because restic takes an exclusive lock on the repository during a check. So if the check that wins the race to get a lock on the repository takes longer than the retry time of the other check jobs (which seems to be 5x over ~2 minutes) the other checks always fail.

The same problem exists with prune schedules because that command also takes an exclusive lock.

Backups are not affected because that command only take an append lock.

Ideas for solving the issue:

  • Maybe Lagoon should only add a check schedule to a single environment (the first production env in the project?). And then prune on the first of each of development and production since they can have differing policies.
  • Somehow ensure that check and prune for each env runs at different times. Though it seems difficult and pointless since the command only needs to run once per repository?
  • Lagoon maintains a special namespace per-project or per-cluster which has a Schedule for each repository containing the prod and dev check and prune schedules?
  • Something else???
@shreddedbacon
Copy link
Member

Yeah, this will need some thinking about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants