Skip to content

Commit

Permalink
phd: add basic "migration-from-base" tests + machinery (#609)
Browse files Browse the repository at this point in the history
In order to ensure that changes to `propolis` don't break instance
migration from previous versions, we would like to add automated testing
of migrating an instance from the current `master` branch of `propolis`
to run on PR branches. This PR adds an implementation of such a test to
`phd`.

To implement this, I've built on top of my change from PR #604 and
modified the `phd` artifact store to introduce a notion of a "base"
Propolis server artifact. This artifact can then be used to test
migration from the "base" Propolis version to the revision under
test. I've added a new test case in `migrate.rs` that creates a source
VM using the "base" Propolis artifact and attempts to migrate that
instance to a target VM running on the "default" Propolis artifact (the
revision being tested). In order to add the new test, I've factored out
test code from the existing `migrate::smoke_test` test.

How `phd` should acquire a "base" Propolis artifact is configured by
several new command-line arguments. `--base-propolis-branch` takes
the name of a Git branch on the `propolis` repo. If this argument is
provided, PHD will download the Propolis debug artifact from the HEAD
commit of that branch from Buildomat. Alternatively, the
`--base-propolis-commit` argument accepts a Git commit hash to
download from Buildomat. Finally, the `--base-propolis-cmd` argument
takes a local path to a binary to use as the "base" Propolis. All
these arguments are mutually exclusive, and if none of them are
provided, the migration-from-base tests are skipped.

When the "base" Propolis artifact is configured from a Git branch
name (i.e. the `--base-propolis-branch` CLI argument is passed), we
use the Buildomat `/public/branch/{repo}/{branch-name}` endpoint, which
returns the Git hash of the HEAD commit to that branch. Then, we attempt
to download an artifact from Buildomat for that commit hash. An issue
here is that Buildomat's branch endpoint will return the latest commit
hash for that branch as soon as it sees a commit, but the artifact for
that commit may not have been published yet, so downloading it will
fail. Ideally, we could resolve this sort of issue by configuring the
`phd-run` job for PRs to depend on the `phd-build` job for `master`, so
that the branch's test run isn't started until any commits that just
merged to `master` have published artifacts. However, this isn't
basely possible in Buildomat (see oxidecomputer/buildomat#46). As a
temporary workaround, I've added code to the PHD artifact store to retry
downloading Buildomat artifacts with an exponential backoff, for up to a
configurable duration (defaulting to 20 minutes). This allows us to wait
for an in-progress build to complete, with a limit on how long we'll
wait for.

Depends on #604
  • Loading branch information
hawkw authored Jan 21, 2024
1 parent 759bf4a commit a13ebb9
Show file tree
Hide file tree
Showing 12 changed files with 834 additions and 246 deletions.
3 changes: 2 additions & 1 deletion .github/buildomat/jobs/phd-run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,9 @@ set +e
(RUST_BACKTRACE=1 ptime -m pfexec $runner \
--emit-bunyan \
run \
--crucible-downstairs-commit auto \
--propolis-server-cmd $propolis \
--base-propolis-branch master \
--crucible-downstairs-commit auto \
--artifact-toml-path $artifacts \
--tmp-directory $tmpdir \
--artifact-directory $tmpdir | \
Expand Down
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions phd-tests/framework/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ doctest = false
[dependencies]
anyhow.workspace = true
backoff.workspace = true
bytes.workspace = true
bhyve_api.workspace = true
camino = { workspace = true, features = ["serde1"] }
cfg-if.workspace = true
Expand Down
272 changes: 272 additions & 0 deletions phd-tests/framework/src/artifacts/buildomat.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.
use super::DownloadConfig;
use anyhow::Context;
use camino::Utf8Path;
use serde::{Deserialize, Serialize};
use std::{borrow::Cow, fmt, str::FromStr, time::Duration};

#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(transparent)]
pub(super) struct Repo(Cow<'static, str>);

#[derive(Clone, Debug, Serialize, Eq, PartialEq)]
#[serde(transparent)]
pub struct Commit(String);

#[derive(Clone, Debug, Serialize, Deserialize)]
#[serde(transparent)]
pub(super) struct Series(Cow<'static, str>);

#[derive(Clone, Debug, Serialize, Deserialize)]
pub struct BuildomatArtifact {
pub(super) repo: Repo,
pub(super) series: Series,
pub(super) commit: Commit,
pub(super) sha256: String,
}

const BASE_URI: &str = "https://buildomat.eng.oxide.computer/public";

impl Repo {
pub(super) const fn from_static(s: &'static str) -> Self {
Self(Cow::Borrowed(s))
}

pub(super) fn artifact_for_commit(
self,
series: Series,
commit: Commit,
filename: impl AsRef<Utf8Path>,
downloader: &DownloadConfig,
) -> anyhow::Result<BuildomatArtifact> {
let filename = filename.as_ref();
let sha256 = self.get_sha256(&series, &commit, filename, downloader)?;

Ok(BuildomatArtifact { repo: self, series, commit, sha256 })
}

pub(super) fn get_branch_head(
&self,
branch: &str,
) -> anyhow::Result<Commit> {
(|| {
let uri = format!("{BASE_URI}/branch/{self}/{branch}");
let client = reqwest::blocking::ClientBuilder::new()
.timeout(Duration::from_secs(5))
.build()?;
let req = client.get(uri).build()?;
let rsp = client.execute(req)?;
let status = rsp.status();
anyhow::ensure!(status.is_success(), "HTTP status: {status}");
let bytes = rsp.bytes()?;
str_from_bytes(&bytes)?.parse::<Commit>()
})()
.with_context(|| {
format!("Failed to determine HEAD commit for {self}@{branch}")
})
}

fn get_sha256(
&self,
series: &Series,
commit: &Commit,
filename: &Utf8Path,
downloader: &DownloadConfig,
) -> anyhow::Result<String> {
(|| {
let filename = filename
.file_name()
.ok_or_else(|| {
anyhow::anyhow!(
"Buildomat filename has no filename: {filename:?}"
)
})?
// Strip the file extension, if any.
//
// Note: we use `Utf8PathBuf::file_name` and then split on '.'s
// rather than using `Utf8PathBuf::file_stem`, because the latter
// only strips off the rightmost file extension, rather than all
// extensions. So, "foo.tar.gz" has a `file_stem()` of "foo.tar",
// rather than "foo".
//
// TODO(eliza): `std::path::Path` has an unstable `file_prefix()`
// method, which does exactly what we would want here (see
// https://github.com/rust-lang/rust/issues/86319). If this is
// stabilized, and `camino` adds a `file_prefix()` method wrapping
// it, this code can be replaced with just `filename.file_prefix()`.
.split('.')
.next()
.ok_or_else(|| {
anyhow::anyhow!(
"Buildomat filename has no filename prefix: {filename:?}"
)
})?;
let uri = format!("{BASE_URI}/file/{self}/{series}/{commit}/{filename}.sha256.txt");
let bytes = downloader.download_buildomat_uri(&uri)?;
str_from_bytes(&bytes).map(String::from)
})().with_context(|| {
format!("Failed to get SHA256 for {self}@{commit}, series: {series}, file: {filename})")
})
}
}

impl fmt::Display for Repo {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.0.fmt(f)
}
}

impl FromStr for Commit {
type Err = anyhow::Error;

fn from_str(s: &str) -> Result<Self, Self::Err> {
let s = s.trim();

// Ensure this looks like a valid Git commit.
anyhow::ensure!(
s.len() == 40,
"Buildomat requires full (40-character) Git commit hashes"
);

for c in s.chars() {
if !c.is_ascii_hexdigit() {
anyhow::bail!(
"'{c}' is not a valid hexadecimal digit; Git \
commit hashes should consist of the characters \
[0-9, a-f, A-F]"
);
}
}

Ok(Self(s.to_string()))
}
}

impl fmt::Display for Commit {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_str(&self.0)
}
}

impl<'de> Deserialize<'de> for Commit {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
FromStr::from_str(&s).map_err(serde::de::Error::custom)
}
}

impl Series {
pub(super) const fn from_static(s: &'static str) -> Self {
Self(Cow::Borrowed(s))
}
}

impl fmt::Display for Series {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.0.fmt(f)
}
}

impl BuildomatArtifact {
pub(super) fn uri(&self, filename: impl AsRef<Utf8Path>) -> String {
let Self {
repo: Repo(ref repo),
series: Series(ref series),
commit: Commit(ref commit),
..
} = self;
let filename = filename.as_ref();
format!("{BASE_URI}/file/{repo}/{series}/{commit}/{filename}")
}
}

impl fmt::Display for BuildomatArtifact {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let Self {
repo: Repo(ref repo),
series: Series(ref series),
commit: Commit(ref commit),
..
} = self;
write!(f, "Buildomat {repo}/{series}@{commit}")
}
}

impl super::DownloadConfig {
/// Download a file from the provided Buildomat URI.
///
/// This method will retry the download if Buildomat returns an error that
/// indicates a file does not yet exist, for up to the configurable maximum
/// retry duration. This retry logic serves as a mechanism for PHD to wait
/// for an artifact we expect to exist to be published, when the build that
/// publishes that artifact is still in progress.
pub(super) fn download_buildomat_uri(
&self,
uri: &str,
) -> anyhow::Result<bytes::Bytes> {
tracing::info!(
timeout = ?self.timeout,
%uri,
"Downloading file from Buildomat...",
);
let client = reqwest::blocking::ClientBuilder::new()
.timeout(self.timeout)
.build()?;
let try_download = || {
let request = client
.get(uri)
.build()
// failing to build the request is a permanent (non-retryable)
// error, because any retries will use the same URI and request
// configuration, so they'd fail as well.
.map_err(|e| backoff::Error::permanent(e.into()))?;

let response = client
.execute(request)
.map_err(|e| backoff::Error::transient(e.into()))?;
if !response.status().is_success() {
// when downloading a file from buildomat, we currently retry
// all errors, since buildomat returns 500s when an artifact
// doesn't exist. hopefully, this will be fixed upstream soon:
// https://github.com/oxidecomputer/buildomat/pull/48
let err = anyhow::anyhow!(
"Buildomat returned HTTP error {}",
response.status()
);
return Err(backoff::Error::transient(err));
}
Ok(response)
};

let log_retry = |error, wait| {
tracing::info!(
%error,
%uri,
"Buildomat download failed, trying again in {wait:?}..."
);
};

let bytes = backoff::retry_notify(
self.buildomat_backoff.clone(),
try_download,
log_retry,
)
.map_err(|e| match e {
backoff::Error::Permanent(e) => e,
backoff::Error::Transient { err, .. } => err,
})
.with_context(|| format!("Failed to download '{uri}' from Buildomat"))?
.bytes()?;

Ok(bytes)
}
}

fn str_from_bytes(bytes: &bytes::Bytes) -> anyhow::Result<&str> {
Ok(std::str::from_utf8(bytes.as_ref())?.trim())
}
66 changes: 18 additions & 48 deletions phd-tests/framework/src/artifacts/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,17 @@
//! Support for working with files consumed by PHD test runs.
use serde::{Deserialize, Serialize};
use std::str::FromStr;
use std::time::Duration;

pub mod buildomat;
mod manifest;
mod store;

pub use store::Store as ArtifactStore;

pub const DEFAULT_PROPOLIS_ARTIFACT: &str = "__DEFAULT_PROPOLIS";

pub const CRUCIBLE_DOWNSTAIRS_ARTIFACT: &str = "__DEFAULT_CRUCIBLE_DOWNSTAIRS";

#[derive(Clone, Debug, Serialize, Eq, PartialEq)]
#[serde(transparent)]
pub struct Commit(String);
pub const BASE_PROPOLIS_ARTIFACT: &str = "__BASE_PROPOLIS";

#[derive(Clone, Debug, Serialize, Deserialize, Eq, PartialEq)]
#[serde(rename_all = "snake_case")]
Expand All @@ -34,7 +31,7 @@ enum ArtifactKind {
enum ArtifactSource {
/// Get the artifact from Buildomat. This downloads from
/// https://buildomat.eng.oxide.computer/public/file/REPO/SERIES/COMMIT.
Buildomat { repo: String, series: String, commit: Commit, sha256: String },
Buildomat(buildomat::BuildomatArtifact),

/// Get the artifact from the manifest's list of remote artifact servers.
RemoteServer { sha256: String },
Expand All @@ -48,7 +45,7 @@ enum ArtifactSource {
struct Artifact {
/// The artifact file's name. When reacquiring an artifact from its source,
/// this filename is appended to the URI generated from that source.
filename: String,
filename: camino::Utf8PathBuf,

/// The kind of artifact this is.
kind: ArtifactKind,
Expand All @@ -62,44 +59,17 @@ struct Artifact {
untar: Option<camino::Utf8PathBuf>,
}

impl FromStr for Commit {
type Err = anyhow::Error;

fn from_str(s: &str) -> Result<Self, Self::Err> {
let s = s.trim();

// Ensure this looks like a valid Git commit.
anyhow::ensure!(
s.len() == 40,
"Buildomat requires full (40-character) Git commit hashes"
);

for c in s.chars() {
if !c.is_ascii_hexdigit() {
anyhow::bail!(
"'{c}' is not a valid hexadecimal digit; Git \
commit hashes should consist of the characters \
[0-9, a-f, A-F]"
);
}
}

Ok(Self(s.to_string()))
}
}

impl std::fmt::Display for Commit {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.write_str(&self.0)
}
}

impl<'de> Deserialize<'de> for Commit {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
FromStr::from_str(&s).map_err(serde::de::Error::custom)
}
#[derive(Debug)]
struct DownloadConfig {
timeout: Duration,
/// Retry backoff settings used when downloading files from Buildomat.
///
/// Retries for Buildomat artifact sources are configured separately from
/// retries for remote URI artifact sources (which we don't currently retry;
/// but probably should). This is because we use a very long maximum
/// duration for retries for Buildomat artifacts, as a way of waiting for an
/// in-progress build to complete (20 minutes by default). On the other
/// hand, we probably don't want to retry a download from S3 for 20 minutes.
buildomat_backoff: backoff::ExponentialBackoff,
remote_server_uris: Vec<String>,
}
Loading

0 comments on commit a13ebb9

Please sign in to comment.