Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(backend): lock rows that are being streamed to pipeline #3073

Merged
merged 2 commits into from
Dec 5, 2024

Conversation

corneliusroemer
Copy link
Contributor

resolves #

preview URL:

Summary

Screenshot

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by an appropriate test.

@corneliusroemer corneliusroemer added the format_me Triggers github_actions to format website code on PR label Oct 24, 2024
Copy link
Member

@chaoran-chen chaoran-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice, looks like a good solution! I didn't verify though and it's not clear to me how we can best check it. Can we somehow write a test for this case? (cc @fengelniederhammer , @fhennig)

@fengelniederhammer
Copy link
Contributor

What's the problem that this tries to solve? A bit of context would be nice. Maybe we can come up with a reasonable test then.

@chaoran-chen
Copy link
Member

chaoran-chen commented Oct 25, 2024

Ah sorry: The background is in Slack: https://loculus.slack.com/archives/C05G172HL6L/p1729806766081959.

Cornelius found that the backend is actually returning the same unprocessed sequence entries multiple times if there are multiple parallel pipelines requesting data.

Copy link
Contributor

@anna-parker anna-parker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am approving as we are seeing these errors again on production, the lock type is defined here: https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row. In the case of SELECT FOR UPDATE and SELECT FOR SHARE, this means it is the updated version of the row that is locked and returned to the client.

and here: https://www.postgresql.org/docs/current/sql-select.html

To prevent the operation from waiting for other transactions to commit, use either the NOWAIT or SKIP LOCKED option. With NOWAIT, the statement reports an error, rather than waiting, if a selected row cannot be locked immediately. With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table. Note that NOWAIT and SKIP LOCKED apply only to the row-level lock(s) — the required ROW SHARE table-level lock is still taken in the ordinary way (see [Chapter 13](https://www.postgresql.org/docs/current/mvcc.html)). You can use [LOCK](https://www.postgresql.org/docs/current/sql-lock.html) with the NOWAIT option first, if you need to acquire the table-level lock without waiting.

This is exactly what we want for prepro

@fengelniederhammer
Copy link
Contributor

multiple parallel pipelines requesting data.

Is there a reliable way to reproduce the issue in a unit test?

@corneliusroemer
Copy link
Contributor Author

multiple parallel pipelines requesting data.

Is there a reliable way to reproduce the issue in a unit test?

I thought that was a question for you 😀

It's possible for sure, but is it easy enough? Should be doable:

  • make two identical requests for extract processed
  • make them block, or rate limit severely so they enter race condition
  • then release rate limit
  • observe result

This should trigger original issue reliably and should hopefully be fixed by this here

@corneliusroemer
Copy link
Contributor Author

I'll merge this now, it's still happening and we already don't have a test atm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format_me Triggers github_actions to format website code on PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants