Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use advisory lock when assigning sequence IDs #401

Merged
merged 4 commits into from
Aug 29, 2024
Merged

Conversation

richardhuaaa
Copy link
Contributor

@richardhuaaa richardhuaaa commented Aug 29, 2024

Hoping for this to be an invisible change - relying on existing unit tests for verification.

Whenever we add a table that relies on a sequence for ordering, we should consider adding a Postgres stored function that performs insertions by taking an advisory lock first. This ensures that inserts on the table are processed serially.

For these stored functions, I've matched the params and param ordering to the existing queries. I also allowed unique constraint errors to bubble up rather than specifying ON CONFLICT, to match the existing queries.

The dev/up script also generates the protos, and I don't see a convenient way to exclude the protos for the replication project. However, I assume that at some point we will need them on this server, for forwards-compatibility purposes.

@richardhuaaa richardhuaaa marked this pull request as draft August 29, 2024 00:51
@richardhuaaa
Copy link
Contributor Author

Just realized the generate script doesn't actually re-generate the queries, so I haven't actually tested the new queries. Will fix that, oops!

@richardhuaaa richardhuaaa marked this pull request as ready for review August 29, 2024 01:00
-- Ensures that the generated sequence ID matches the insertion order
-- Only released at the end of the enclosing transaction - beware if called within a long transaction
PERFORM
pg_advisory_xact_lock(hashtext('group_messages_sequence'));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the way this is queried, do we actually care that the group_messages table IDs are globally monotonically increasing. What if we made the lock key the group_id instead?

Unlike with replication, no one is querying across multiple groups.

Same goes for welcome messages.

We could get away with doing the same for inbox logs too, even if there is a bit of a global namespace for those because of the address mapping. But that doesn't rely on the sequence_id being monotonically increasing, so I think it's fine to have that lock be scoped as well.

@richardhuaaa
Copy link
Contributor Author

richardhuaaa commented Aug 29, 2024

Great nuance @neekolas, this gives us more concurrency.

Some implementation notes:

  • We are using hashtext in other places in this repo too, so I chose to use the same thing. hashtext is faster than md5 and the result fits inside a 32-bit int, however it is a postgres-internal function that is not documented (I think it's used internally for indexes). That means its implementation may change between Postgres versions. I think that shouldn't cause any problems for us when used for a lock, but if we are concerned, we can always switch to md5.
  • The performance of hashtext is likely not a problem, especially given we are now locking per log, but at some point we should benchmark the throughput we can get per log with this locking strategy. If the hashing is a problem, worst case we can use a simpler hash that allows more collision, hardcode some constants, or generate the hash outside of the database.

@richardhuaaa richardhuaaa requested a review from neekolas August 29, 2024 18:43
-- Ensures that the generated sequence ID matches the insertion order
-- Only released at the end of the enclosing transaction - beware if called within a long transaction
PERFORM
pg_advisory_xact_lock(hashtext('group_messages_sequence'), hashtext(encode(group_id, 'hex')));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could cut down on the number of hashes by concatenating the two strings before hashing if we're concerned about perf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good option if perf becomes an issue. Separated it for this implementation to reduce collisions (especially against other tables)

@richardhuaaa richardhuaaa merged commit 987ef53 into main Aug 29, 2024
3 checks passed
@richardhuaaa richardhuaaa deleted the rich/serial-ids branch August 29, 2024 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants