fix(backend): lock rows that are being streamed to pipeline #3073

corneliusroemer · 2024-10-24T23:05:32Z

resolves #

preview URL:

Summary

Screenshot

PR Checklist

All necessary documentation has been adapted.
The implemented feature is covered by an appropriate test.

chaoran-chen

Ah, nice, looks like a good solution! I didn't verify though and it's not clear to me how we can best check it. Can we somehow write a test for this case? (cc @fengelniederhammer , @fhennig)

fengelniederhammer · 2024-10-25T13:09:31Z

What's the problem that this tries to solve? A bit of context would be nice. Maybe we can come up with a reasonable test then.

chaoran-chen · 2024-10-25T13:21:47Z

Ah sorry: The background is in Slack: https://loculus.slack.com/archives/C05G172HL6L/p1729806766081959.

Cornelius found that the backend is actually returning the same unprocessed sequence entries multiple times if there are multiple parallel pipelines requesting data.

anna-parker

I am approving as we are seeing these errors again on production, the lock type is defined here: https://www.postgresql.org/docs/current/transaction-iso.html#XACT-READ-COMMITTED

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row. In the case of SELECT FOR UPDATE and SELECT FOR SHARE, this means it is the updated version of the row that is locked and returned to the client.

and here: https://www.postgresql.org/docs/current/sql-select.html

To prevent the operation from waiting for other transactions to commit, use either the NOWAIT or SKIP LOCKED option. With NOWAIT, the statement reports an error, rather than waiting, if a selected row cannot be locked immediately. With SKIP LOCKED, any selected rows that cannot be immediately locked are skipped. Skipping locked rows provides an inconsistent view of the data, so this is not suitable for general purpose work, but can be used to avoid lock contention with multiple consumers accessing a queue-like table. Note that NOWAIT and SKIP LOCKED apply only to the row-level lock(s) — the required ROW SHARE table-level lock is still taken in the ordinary way (see [Chapter 13](https://www.postgresql.org/docs/current/mvcc.html)). You can use [LOCK](https://www.postgresql.org/docs/current/sql-lock.html) with the NOWAIT option first, if you need to acquire the table-level lock without waiting.

This is exactly what we want for prepro

fengelniederhammer · 2024-10-29T18:29:47Z

multiple parallel pipelines requesting data.

Is there a reliable way to reproduce the issue in a unit test?

corneliusroemer · 2024-10-29T22:39:00Z

multiple parallel pipelines requesting data.

Is there a reliable way to reproduce the issue in a unit test?

I thought that was a question for you 😀

It's possible for sure, but is it easy enough? Should be doable:

make two identical requests for extract processed
make them block, or rate limit severely so they enter race condition
then release rate limit
observe result

This should trigger original issue reliably and should hopefully be fixed by this here

corneliusroemer · 2024-12-05T11:15:09Z

I'll merge this now, it's still happening and we already don't have a test atm

fix(backend): lock rows that are being streamed to pipeline

85363f5

corneliusroemer requested a review from fengelniederhammer October 24, 2024 23:05

corneliusroemer added the format_me Triggers github_actions to format website code on PR label Oct 24, 2024

Automated backend code formatting

af4f07f

chaoran-chen reviewed Oct 25, 2024

View reviewed changes

anna-parker approved these changes Oct 29, 2024

View reviewed changes

corneliusroemer merged commit 9d6a506 into main Dec 5, 2024
1 check passed

corneliusroemer deleted the forupdate branch December 5, 2024 11:15

corneliusroemer mentioned this pull request Dec 5, 2024

Update Loculus version to ee77b4 pathoplexus/pathoplexus#313

Merged

anna-parker mentioned this pull request Dec 6, 2024

extract-unprocessed throws errors when parallelized and prepro crashes as result #3394

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend): lock rows that are being streamed to pipeline #3073

fix(backend): lock rows that are being streamed to pipeline #3073

corneliusroemer commented Oct 24, 2024

chaoran-chen left a comment

fengelniederhammer commented Oct 25, 2024

chaoran-chen commented Oct 25, 2024 •

edited

Loading

anna-parker left a comment

fengelniederhammer commented Oct 29, 2024

corneliusroemer commented Oct 29, 2024

corneliusroemer commented Dec 5, 2024

fix(backend): lock rows that are being streamed to pipeline #3073

fix(backend): lock rows that are being streamed to pipeline #3073

Conversation

corneliusroemer commented Oct 24, 2024

Summary

Screenshot

PR Checklist

chaoran-chen left a comment

Choose a reason for hiding this comment

fengelniederhammer commented Oct 25, 2024

chaoran-chen commented Oct 25, 2024 • edited Loading

anna-parker left a comment

Choose a reason for hiding this comment

fengelniederhammer commented Oct 29, 2024

corneliusroemer commented Oct 29, 2024

corneliusroemer commented Dec 5, 2024

chaoran-chen commented Oct 25, 2024 •

edited

Loading