Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry updating conda packages #976

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

soapy1
Copy link
Contributor

@soapy1 soapy1 commented Nov 11, 2024

Fixes #852

Description

Th issue described in #852 arises because part of the environment build process is to execute the action_add_conda_prefix_package. This action tries to insert all the packages from the environment into the db. If these packages overlap with any of the packages inserted by task_update_conda_channel (which is common on a first run of the task) then we get the integrity error.

This PR adds a retry to the part of the task_update_conda_channel task that updates the conda_package table. So, if an integrity error occurs due to a package getting inserted by another process while a bulk update is happening, then the whole process for inserting the conda packages will try again. Subsequent bulk inserts are expected to work, since part of the update process is to only insert packages that haven't already been added.

Other options I looked out to resolve this issue are:

  • using an upsert statements instead of an insert statement
    • sqlalchemy does not provide a platform agnostic way to do upserts. Since conda-store is meant to be db platform agnostic, I think this is not an great option.
  • briefly looked into locking the conda_package table to writes while the bulk insert is happening.
    • I don't think this is a great option since it's pretty heavy handed and may have unintentional consequences as the system evolves.

Pull request checklist

  • Did you test this change locally?
  • Did you update the documentation (if required)?
  • Did you add/update relevant tests for this change (if required)?

Copy link

netlify bot commented Nov 11, 2024

Deploy Preview for conda-store canceled.

Name Link
🔨 Latest commit 1510115
🔍 Latest deploy log https://app.netlify.com/sites/conda-store/deploys/67328ca67680630008d61efa

Copy link

netlify bot commented Nov 11, 2024

Deploy Preview for conda-store canceled.

Name Link
🔨 Latest commit c16061d
🔍 Latest deploy log https://app.netlify.com/sites/conda-store/deploys/67328fccf119570008a467c6

@soapy1 soapy1 added the type: bug 🐛 Something isn't working label Nov 11, 2024
@soapy1 soapy1 marked this pull request as ready for review November 12, 2024 05:30
@@ -187,3 +188,26 @@ def compile_arn_sql_like(
re.sub(r"\*", "%", match.group(1)),
re.sub(r"\*", "%", match.group(2)),
)


def retry_on_errors(allowed_retries=1, on_errors=(), logger=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we're seeing this issue is a race condition that happens between

  1. The task that inserts packages found from the repodata.json that gets downloaded when a channel gets updated (task_update_conda_channel), and
  2. The preexisting conda packages in the the conda-store prefixes (action_add_conda_prefix_packages)

It seems like making this decorator instantly retry the decorated function upon failure is just delaying one of the two competing tasks rather than solving the underlying race condition. The bulk insert above that gets retried seems like an optimization (that's why it's using bulk insert rather than package by package insertion), so is it possible that action_add_conda_prefix_packages wouldn't yet be done by the time the retry is made?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think you're right. We could wait some time before retrying. But maybe it's worth thinking about it a little more to find a better solution 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of a few other options:

  1. don't run the action_add_conda_prefix_packages if the task_update_conda_channel task is running. This option isn't so good because if the task fails, the package won't get added and we'll have no way know about the missing data.
  2. when updating channels, check and insert each package instead of doing a bulk insert. This would put a lot more load on the database. This will definitely make things less efficient. We are inserting on the order of 10000's packages per architecture. Probably not a good option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug 🐛 Something isn't working
Projects
Status: New 🚦
Development

Successfully merging this pull request may close these issues.

[BUG] - IntegrityError upon startup
2 participants