-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[omicron-package] Retry failed downloads automatically #4168
Conversation
@@ -303,8 +325,63 @@ async fn get_sha256_digest(path: &PathBuf) -> Result<Digest> { | |||
Ok(context.finish()) | |||
} | |||
|
|||
async fn download_prebuilt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically just refactored to make re-calling it easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this.
It'd be nice if we could use the existing general-purpose retry stuff but I get that the policies they use today aren't appropriate here.
I also wonder if we're going to want to resume downloads from where they left off. (If the server is failing somewhat often, we might never be able to do a full download, but we would be able to make progress if we resumed partial downloads.) But we can always add this if we continue to see problems.
.content_length() | ||
.ok_or_else(|| anyhow!("Missing Content Length"))?, | ||
); | ||
let mut file = tokio::fs::File::create(&path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that we don't explicitly remove the old file. Does this remove it, or at least truncate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It truncates it on the second attempt. If it helps, the hash won't match on partial downloads, so it would know that it needs to retry if we re-execute.
This will probably require adding support in buildomat (oxidecomputer/buildomat#31). |
Tested manually, by running:
Then cycling my machine's network connectivity. I observed the "Failed to download prebuilt messages", saw the attempts ticking down, and then reconnected. Once connectivity was restored, the download succeeded.
Fixes #4165