Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add troubleshooting section in MIGRATIONS.md #45

Merged
merged 1 commit into from
Oct 18, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions architecture/MIGRATIONS.MD
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,54 @@ $ date '+%Y%m%d%H%M%S'
### Transactional Migrations

Each migration file will run in a single transaction, which means that you can not have for example multiple `CREATE INDEX CONCURRENTLY` statements in a given migration file. Instead break each statement out into it's own migration file.

## Troubleshooting

### Migration Table Locks

Bun (the db client) uses a simple locking mechanism while running migrations that locks an entire table by attempting to write a row `(id, table_name)` to a table called `bun_migration_locks`. The table uses `table_name` as a unique key so that only one process can lock a table at a time. After the process finishes modifying the table, it deletes its lock.

However, a problem occurs if the process dies while still holding a lock, as the lock will never be deleted & no other process will be allowed to lock the table.
Copy link

@DracoLi DracoLi Oct 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did you encounter this issue? was it the testnet db?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, on tesetnet. potentially because of my own rapid fire deploys 😄
noticed the migrations weren't running. turned on db query logging and noticed the service continually getting blocked on lock creation 🥴


If you find the migrations aren't running, check the following:

Enable logging by setting the following:
```sh
# first, make sure you're actually trying to run the migrations... :)
RUN_DATABASE_MIGRATIONS=true

# debug logging for the service logger
LOG_LEVEL=DEBUG
# enable database query logging
DATABASE_QUERY_LOGGING_ENABLED=true
```

With `LOG_LEVEL=DEBUG`, check that the service is attempting to run migrations. It will produce the following log when migrations are enabled:
```json
{
"level": "debug",
"time": "2023-10-17T20:46:17Z",
"caller": "/app/service/service.go:155",
"message": "running migrations on database"
}
```

Repeated database query logs that look like this indicate the problem is with the table locks:
```log
[bun] 20:46:58.822 INSERT 2.529ms INSERT INTO bun_migration_locks ("id", "table_name") VALUES (DEFAULT, 'bun_migrations') RETURNING "id" pgdriver.Error: ERROR: duplicate key value violates unique constraint "bun_migration_locks_table_name_key" (SQLSTATE=23505)
```

To resolve, manually connect to the database and delete the lock:
```sql
SELECT * FROM bun_migration_locks;
-- verify and locate the bad lock
| -- | id | table_name |
| --- | ----- | -------------- |
| -- | 12345 | bun_migrations |

-- DANGEROUS DELETE OPERATION!
-- THINK BEFORE YOU PASTE!
DELETE FROM bun_migration_locks WHERE id = 12345;
```

The service, which is retrying the lock acquisition, should be able to acquire the lock and complete the migrations (successfully removing its lock after it finishes).
Loading