Skip to content

Commit

Permalink
rearrange buildomat jobs; rewrite releng process in rust and aggressi…
Browse files Browse the repository at this point in the history
…vely parallelize (#5744)

(Note: documentation says `cargo xtask releng` but I am going to wire
that up in a follow-up PR; the current equivalent is `cargo run
--release --bin omicron-releng`.)

Prior to this change we have five main "release engineering" Buildomat
jobs that do operations beyond running the test suite:
- a **package** job which runs omicron-package in various
configurations,
- a **build OS images** job which builds the host and trampoline images,
- a **TUF repo** job which builds the final TUF repo *(this is the build
artifact we actually want)*,
- a **deploy** job which uses the single-sled packages to test that a VM
boots to SSH *(this is a test we actually want)*,
- and a **CI tools** job which builds common tools used by multiple
jobs.

This looks like:

```mermaid
graph LR
    package --> host-image["build OS images"]
    package --> deploy
    package --> tuf-repo["TUF repo"]
    host-image --> tuf-repo
    ci-tools["CI tools"] --> deploy
    ci-tools --> tuf-repo
```

(There are also the currently-disabled a4x2 jobs but those are
independent of this particular graph.)

I think the initial idea behind this was to reuse build artifacts where
possible, but this is pretty complicated and adds a lot more output
upload/download overhead than expected, which slows down the time to get
the end artifact we actually want.

This PR changes the graph to:

```mermaid
graph LR
    package --> deploy
    tuf-repo["TUF repo"]
```

And the **TUF repo** job primarily runs a new **releng** binary, which
runs all of the steps required to download and build all the components
of the TUF repo in a single task, using a terrible job runner I wrote.

The primary goal here was to reduce the time from pushing a commit to
getting a TUF repo out the other end; this drops time-to-TUF-repo from
~80 minutes to ~45. In the process this also made it much easier to
build a TUF repo (and iterate on that process) locally: just run `cargo
xtask releng` (TODO: soon). It also deleted a lot of Bash.

One thing to note is that, in service of the mission to get
time-to-TUF-repo down as much as possible, that job _only_ uploads the
TUF repo (and some logs). I also put all of the outputs for the
**package** job into a single tarball for the **deploy** job to unpack.
There are no longer separate uploads for the OS images and each zone;
these can be extracted from the repo as we normally do.
  • Loading branch information
iliana authored May 15, 2024
1 parent 5ace1af commit 59636c9
Show file tree
Hide file tree
Showing 26 changed files with 1,754 additions and 574 deletions.
77 changes: 0 additions & 77 deletions .github/buildomat/jobs/ci-tools.sh

This file was deleted.

12 changes: 1 addition & 11 deletions .github/buildomat/jobs/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,6 @@
#: [dependencies.package]
#: job = "helios / package"
#:
#: [dependencies.ci-tools]
#: job = "helios / CI tools"

set -o errexit
set -o pipefail
Expand Down Expand Up @@ -144,13 +142,6 @@ pfexec chown build:build /opt/oxide/work
cd /opt/oxide/work

ptime -m tar xvzf /input/package/work/package.tar.gz
cp /input/package/work/zones/* out/
mv out/nexus-single-sled.tar.gz out/nexus.tar.gz
mkdir tests
for p in /input/ci-tools/work/end-to-end-tests/*.gz; do
ptime -m gunzip < "$p" > "tests/$(basename "${p%.gz}")"
chmod a+x "tests/$(basename "${p%.gz}")"
done

# Ask buildomat for the range of extra addresses that we're allowed to use, and
# break them up into the ranges we need.
Expand Down Expand Up @@ -354,7 +345,7 @@ echo "Waited for nexus: ${retry}s"

export RUST_BACKTRACE=1
export E2E_TLS_CERT IPPOOL_START IPPOOL_END
eval "$(./tests/bootstrap)"
eval "$(./target/debug/bootstrap)"
export OXIDE_HOST OXIDE_TOKEN

#
Expand Down Expand Up @@ -387,7 +378,6 @@ done
/usr/oxide/oxide --resolve "$OXIDE_RESOLVE" --cacert "$E2E_TLS_CERT" \
image promote --project images --image debian11

rm ./tests/bootstrap
for test_bin in tests/*; do
./"$test_bin"
done
93 changes: 0 additions & 93 deletions .github/buildomat/jobs/host-image.sh

This file was deleted.

115 changes: 18 additions & 97 deletions .github/buildomat/jobs/package.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,11 @@
#: name = "helios / package"
#: variety = "basic"
#: target = "helios-2.0"
#: rust_toolchain = "1.72.1"
#: rust_toolchain = "1.77.2"
#: output_rules = [
#: "=/work/version.txt",
#: "=/work/package.tar.gz",
#: "=/work/global-zone-packages.tar.gz",
#: "=/work/trampoline-global-zone-packages.tar.gz",
#: "=/work/zones/*.tar.gz",
#: ]
#:
#: [[publish]]
#: series = "image"
#: name = "global-zone-packages"
#: from_output = "/work/global-zone-packages.tar.gz"
#:
#: [[publish]]
#: series = "image"
#: name = "trampoline-global-zone-packages"
#: from_output = "/work/trampoline-global-zone-packages.tar.gz"

set -o errexit
set -o pipefail
Expand All @@ -32,17 +19,6 @@ rustc --version
WORK=/work
pfexec mkdir -p $WORK && pfexec chown $USER $WORK

#
# Generate the version for control plane artifacts here. We use `0.git` as the
# prerelease field because it comes before `alpha`.
#
# In this job, we stamp the version into packages installed in the host and
# trampoline global zone images.
#
COMMIT=$(git rev-parse HEAD)
VERSION="8.0.0-0.ci+git${COMMIT:0:11}"
echo "$VERSION" >/work/version.txt

ptime -m ./tools/install_builder_prerequisites.sh -yp
ptime -m ./tools/ci_download_softnpu_machinery

Expand All @@ -52,88 +28,33 @@ ptime -m cargo run --locked --release --bin omicron-package -- \
-t test target create -i standard -m non-gimlet -s softnpu -r single-sled
ptime -m cargo run --locked --release --bin omicron-package -- \
-t test package
mapfile -t packages \
< <(cargo run --locked --release --bin omicron-package -- -t test list-outputs)

# Build the xtask binary used by the deploy job
ptime -m cargo build --locked --release -p xtask

# Assemble some utilities into a tarball that can be used by deployment
# phases of buildomat.
# Build the end-to-end tests
# Reduce debuginfo just to line tables.
export CARGO_PROFILE_DEV_DEBUG=line-tables-only
export CARGO_PROFILE_TEST_DEBUG=line-tables-only
ptime -m cargo build --locked -p end-to-end-tests --tests --bin bootstrap \
--message-format json-render-diagnostics >/tmp/output.end-to-end.json
mkdir tests
/opt/ooce/bin/jq -r 'select(.profile.test) | .executable' /tmp/output.end-to-end.json \
| xargs -I {} -t cp {} tests/

# Assemble these outputs and some utilities into a tarball that can be used by
# deployment phases of buildomat.

files=(
out/*.tar
out/target/test
out/npuzone/*
package-manifest.toml
smf/sled-agent/non-gimlet/config.toml
target/release/omicron-package
target/release/xtask
target/debug/bootstrap
tests/*
)

ptime -m tar cvzf $WORK/package.tar.gz "${files[@]}"

tarball_src_dir="$(pwd)/out/versioned"
stamp_packages() {
for package in "$@"; do
cargo run --locked --release --bin omicron-package -- stamp "$package" "$VERSION"
done
}

# Keep the single-sled Nexus zone around for the deploy job. (The global zone
# build below overwrites the file.)
mv out/nexus.tar.gz out/nexus-single-sled.tar.gz

# Build necessary for the global zone
ptime -m cargo run --locked --release --bin omicron-package -- \
-t host target create -i standard -m gimlet -s asic -r multi-sled
ptime -m cargo run --locked --release --bin omicron-package -- \
-t host package
stamp_packages omicron-sled-agent mg-ddm-gz propolis-server overlay oxlog pumpkind-gz

# Create global zone package @ $WORK/global-zone-packages.tar.gz
ptime -m ./tools/build-global-zone-packages.sh "$tarball_src_dir" $WORK

# Non-Global Zones

# Assemble Zone Images into their respective output locations.
#
# Zones that are included into another are intentionally omitted from this list
# (e.g., the switch zone tarballs contain several other zone tarballs: dendrite,
# mg-ddm, etc.).
#
# Note that when building for a real gimlet, `propolis-server` and `switch-*`
# should be included in the OS ramdisk.
mkdir -p $WORK/zones
zones=(
out/clickhouse.tar.gz
out/clickhouse_keeper.tar.gz
out/cockroachdb.tar.gz
out/crucible-pantry-zone.tar.gz
out/crucible-zone.tar.gz
out/external-dns.tar.gz
out/internal-dns.tar.gz
out/nexus.tar.gz
out/nexus-single-sled.tar.gz
out/oximeter.tar.gz
out/propolis-server.tar.gz
out/switch-*.tar.gz
out/ntp.tar.gz
out/omicron-gateway-softnpu.tar.gz
out/omicron-gateway-asic.tar.gz
out/overlay.tar.gz
out/probe.tar.gz
)
cp "${zones[@]}" $WORK/zones/

#
# Global Zone files for Trampoline image
#

# Build necessary for the trampoline image
ptime -m cargo run --locked --release --bin omicron-package -- \
-t recovery target create -i trampoline
ptime -m cargo run --locked --release --bin omicron-package -- \
-t recovery package
stamp_packages installinator mg-ddm-gz

# Create trampoline global zone package @ $WORK/trampoline-global-zone-packages.tar.gz
ptime -m ./tools/build-trampoline-global-zone-packages.sh "$tarball_src_dir" $WORK
ptime -m tar cvzf $WORK/package.tar.gz "${files[@]}" "${packages[@]}"
Loading

0 comments on commit 59636c9

Please sign in to comment.