-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication slot lost after switchover? #25
Comments
An active state of true on standby indicates that the slot has not yet synchronized and is not safe for use. Hence when the failover happened the slot on the new primary was lost which is an expected. Did you see this below kind of error being emitted on standby? "still waiting for remote slot %s lsn ... and catalog xmin ... to pass local slot lsn ... and catalog xmin ..." |
That is correct, I see the message |
I am confused about this. If I have multiple slots that need to be synchronized, after the first slot fails to be synchronized, will the rest not be processed? |
Any idea of why it does not synchronize or how to debug this? |
It looks like the state ends up in a desired 'false' on the standby side after all. Perhaps some commit is needed first, as I was testing in a setup that was isolated without any mutations. |
Hi @ashucoek, All that I can see in the logs is the line for still waiting for remote_slot: "2023-12-12 20:27:42.995 UTC,,,309,,6578b8aa.135,100,,2023-12-12 19:46:50 UTC,2/26,0,LOG,00000,"still waiting for remote slot fivetran_slot lsn (0/44B5620) and catalog xmin (751) to pass local slot lsn (0/800E820) and catalog xmin (755)",,,,,,,,,"pg_failover_slots worker","pg_failover_slots worker",,0" |
You need to have some activity on the primary so that the |
Situation: a PostgreSQL 13 master/standby setup on-premises using repmgr.
We recently added a logical standby, by the use of AWS Data Migration Server (DMS) to replicate to a cloud instance.
To avoid the replication to fail after a switchover, I installed pg_failover_slots on both the master and standby. And after adding some pg_hba.conf rules, the logical replication_slot also gets visible on the standby node.
primary:
standby:
For some reason I have to stop the replication task in AWS first, else the primary instance will not shutdown (but that is not related to pg_failover_slots) during a switchover. I see the "active" state turn to false on the primary after stopping the replication task, but no change on the standby. After a switchover, the replication slot gets lost on both instances. And the replication task turns in error state after restart.
Any clue why this is not working?
The text was updated successfully, but these errors were encountered: