Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to perform a manual failover without applying any outstanding WAL? #3957

Open
mzwettler2 opened this issue Jul 12, 2024 · 2 comments
Open
Labels

Comments

@mzwettler2
Copy link

We have a simple configuration with 2 replicas (1 primary + 1 standby).

We configured the standby to run 3 hours behind the primary:

  • recovery_min_apply_delay: '3h' # standby 3 hours behind
  • synchronous_mode: false # asynchron

In case there is a logical problem on the primary (wrong data processing, misleaded application upgrade,…) we want to perform a manual failover to the standby, which still contains the old, correct data status. That means the standby should not apply any more WALs within the 3 hour residue in the event of a manual failover.

I could not find any working solution. Whatever I tried the standby first applied all outstanding WALs (which I don't want) and only then promoted the standby. Any idea how to get this working?

@andrewlecuyer
Copy link
Collaborator

Hi @mzwettler2 , if you want to restore your data back to a specific time (e.g. to three hours prior, specifically due to an issue with your data), this sounds like a Disaster Recovery (DR) scenario. More specifically, it sounds like you want a point-in-time restore (PITR), as discussed in the following docs:

https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/backups-disaster-recovery/disaster-recovery#perform-an-in-place-point-in-time-recovery-pitr

I therefore recommend this as the best/safest way to meet your needs/use-case.

@mzwettler2
Copy link
Author

Hi @andrewlecuyer, thanks for your answer.

we have to be back online very quickly in the event of an error. the database is very large. a PITR would take too long. that's why we want to work with a lagging standby database, which is a very common use case. we also do this in classic database operation. unfortunately, the crunchy implementation does not currently enable it on kubernetes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants