-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(backend): lock rows that are being streamed to pipeline #3073
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nice, looks like a good solution! I didn't verify though and it's not clear to me how we can best check it. Can we somehow write a test for this case? (cc @fengelniederhammer , @fhennig)
What's the problem that this tries to solve? A bit of context would be nice. Maybe we can come up with a reasonable test then. |
Ah sorry: The background is in Slack: https://loculus.slack.com/archives/C05G172HL6L/p1729806766081959. Cornelius found that the backend is actually returning the same unprocessed sequence entries multiple times if there are multiple parallel pipelines requesting data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am approving as we are seeing these errors again on production, the lock type is defined here: https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED
UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row. In the case of SELECT FOR UPDATE and SELECT FOR SHARE, this means it is the updated version of the row that is locked and returned to the client.
and here: https://www.postgresql.org/docs/current/sql-select.html
To prevent the operation from waiting for other transactions to commit, use either the NOWAIT or SKIP LOCKED option. With NOWAIT, the statement reports an error, rather than waiting, if a selected row cannot be locked immediately. With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table. Note that NOWAIT and SKIP LOCKED apply only to the row-level lock(s) — the required ROW SHARE table-level lock is still taken in the ordinary way (see [Chapter 13](https://www.postgresql.org/docs/current/mvcc.html)). You can use [LOCK](https://www.postgresql.org/docs/current/sql-lock.html) with the NOWAIT option first, if you need to acquire the table-level lock without waiting.
This is exactly what we want for prepro
Is there a reliable way to reproduce the issue in a unit test? |
I thought that was a question for you 😀 It's possible for sure, but is it easy enough? Should be doable:
This should trigger original issue reliably and should hopefully be fixed by this here |
I'll merge this now, it's still happening and we already don't have a test atm |
resolves #
preview URL:
Summary
Screenshot
PR Checklist