-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry updating conda packages #976
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for conda-store canceled.
|
✅ Deploy Preview for conda-store canceled.
|
e8c95c9
to
99f1ad4
Compare
@@ -187,3 +188,26 @@ def compile_arn_sql_like( | |||
re.sub(r"\*", "%", match.group(1)), | |||
re.sub(r"\*", "%", match.group(2)), | |||
) | |||
|
|||
|
|||
def retry_on_errors(allowed_retries=1, on_errors=(), logger=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we're seeing this issue is a race condition that happens between
- The task that inserts packages found from the
repodata.json
that gets downloaded when a channel gets updated (task_update_conda_channel
), and - The preexisting conda packages in the the conda-store prefixes (
action_add_conda_prefix_packages
)
It seems like making this decorator instantly retry the decorated function upon failure is just delaying one of the two competing tasks rather than solving the underlying race condition. The bulk insert above that gets retried seems like an optimization (that's why it's using bulk insert rather than package by package insertion), so is it possible that action_add_conda_prefix_packages
wouldn't yet be done by the time the retry is made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I think you're right. We could wait some time before retrying. But maybe it's worth thinking about it a little more to find a better solution 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of a few other options:
- don't run the
action_add_conda_prefix_packages
if thetask_update_conda_channel
task is running. This option isn't so good because if the task fails, the package won't get added and we'll have no way know about the missing data. - when updating channels, check and insert each package instead of doing a bulk insert. This would put a lot more load on the database. This will definitely make things less efficient. We are inserting on the order of 10000's packages per architecture. Probably not a good option
Fixes #852
Description
Th issue described in #852 arises because part of the environment build process is to execute the
action_add_conda_prefix_package
. This action tries to insert all the packages from the environment into the db. If these packages overlap with any of the packages inserted bytask_update_conda_channel
(which is common on a first run of the task) then we get the integrity error.This PR adds a retry to the part of the
task_update_conda_channel
task that updates the conda_package table. So, if an integrity error occurs due to a package getting inserted by another process while a bulk update is happening, then the whole process for inserting the conda packages will try again. Subsequent bulk inserts are expected to work, since part of the update process is to only insert packages that haven't already been added.Other options I looked out to resolve this issue are:
Pull request checklist