Skip to content

Commit

Permalink
doc: sync command breakdown and rollback notes (#620)
Browse files Browse the repository at this point in the history
* doc: sync command breakdown and rollback notes

* doc: update

* doc: update playbook and add restart instructions

* doc: typo
  • Loading branch information
vjeeva authored Nov 15, 2024
1 parent cba6063 commit 0fa60e1
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 23 deletions.
42 changes: 42 additions & 0 deletions docs/playbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,45 @@ To remedy this issue, you can perform the following:
1. If you see the error with the entire DSN (including password and IP address or hostname), identify if the host is the **source** or **destination** database.
2. Once identified, run the following to PSQL into that host: `psql "$(belt <src/dst>-dsn <datacenter-name> <database-name>)"`
3. In that PSQL terminal, run the following to set the password according to the `node` configuration: `ALTER ROLE pglogical PASSWORD '<password-you-saw>';`

## How can I roll back?

**NOTE: The rollback process is not fully implemented in pgbelt. You should make every effort to solve
issues that surface only after writes have succeeded in the target database at the application level first!**

If you discover an application issue that requires a rollback to the old database, you can do so without data loss even after
writes have succeeded in the target database.

To perform a rollback you will need to begin another period of application downtime where neither
database receives any writes. Once you are sure downtime has begun, run the following:

$ belt teardown-back-replication testdatacenter1 database1
$ belt restore-logins testdatacenter1 database1

If you've lost the pgbelt config file where these users' names were stored when you ran the revoke logins
command, some users might be missed here.

Things that will need manual resolution:

- Sequence values on the source database. You will need to copy these over from the target database, no `belt` commands cover this yet.
- Tables without Primary Keys will need to be updated. You will need to copy these over from the target database to the source, no `belt` commands cover this yet.

After you are sure that sequences and tables without primary keys have been synchronized from the target
into the old source, point your application to the old source and your rollback is complete.

## I started a pgbelt replication job and need to restart it from scratch. How can I restart a pgbelt migration?

The following is a general guide to restarting a pgbelt migration. This is useful if you have a failed migration, or if you need to restart a migration after a rollback.

Run the following commands:

$ belt teardown-back-replication testdatacenter1 database1
$ belt teardown-forward-replication testdatacenter1 database1
$ belt teardown testdatacenter1 database1
$ belt teardown testdatacenter1 database1 --full
$ belt remove-constraints testdatacenter1 database1
$ belt remove-indexes testdatacenter1 database1

Note that the first four commands will remove all replication job setup from the databases. `remove-constraints` removes NOT VALID constraints from the target schema so when you restart replication, they don't cause failed inserts (these must not exist during the initial setup). `remove-indexes` removes all indexes from the target schema to help speed up the initial bulk load. `remove-indexes` is not necessary to run, you may skip this if needed.

After running these commands, you can `TRUNCATE` the tables in the destination database and start the migration from the beginning. **Please take as much precaution as possible when running TRUNCATE, as it will delete all data in the tables. Especially please ensure you are running this on the correct database!**
35 changes: 12 additions & 23 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,16 @@ Therefore the next command will do the following:
$ belt sync testdatacenter1 database1
```

If the above command fails, you can diagnose and run the individual steps with the following commands:

- `sync-sequences` - reads and sets sequences values from SRC to DST at the time of command execution
- `dump-tables` - dumps only tables without Primary Keys
- `load-tables` - load into DST DB the tables from the `dump-tables` command (found on disk)
- `dump-contraints` - dumps NOT VALID constraints from your SRC DB schema onto disk
- `load-constraints` - load NOT VALID constraints from disk to your DST DB schema
- `validate-data` - Check random 100 rows and last 100 rows of every table involved in the replication job, and ensure all match exactly.
- `analyze` - Run ANALYZE on the database

## Step 8: Enable write traffic to the destination host

Enabling write traffic to the destination host is done outside of PgBelt, with your application.
Expand All @@ -202,27 +212,6 @@ The first command will tear down all replication jobs if still running. At this

The second command will run through the first command, and finally drop the `pglogical` extension from the database. This is separated out because the extension drop tends to hang if the previous steps are done right beforehand. When run separately, the DROP command likely will run without hanging or run in significantly less time.

# (Optional) Rolling Back

**NOTE: The rollback process is not fully implemented in pgbelt. You should make every effort to solve
issues that surface only after writes have succeeded in the target database at the application level first!**

If you discover an application issue that requires a rollback to the old database, you can do so without data loss even after
writes have succeeded in the target database.

To perform a rollback you will need to begin another period of application downtime where neither
database receives any writes. Once you are sure downtime has begun, run the following:

$ belt teardown-back-replication testdatacenter1 database1
$ belt restore-logins testdatacenter1 database1

If you've lost the pgbelt config file where these users' names were stored when you ran the revoke logins
command, some users might be missed here.

Things that will need manual resolution:

- Sequence values on the source database
- Tables without Primary Keys will need to be updated
# Final Notes

After you are sure that sequences and tables without primary keys have been synchronized from the target
into the old source, point your application to the old source and your rollback is complete.
Please note that instructions for rolling back and restarting a migration are now in the playbook in this directory. Please refer to those for more information.

0 comments on commit 0fa60e1

Please sign in to comment.