Skip to content

Commit

Permalink
document tips for Omicron development (#3221)
Browse files Browse the repository at this point in the history
  • Loading branch information
davepacheco authored May 26, 2023
1 parent c27796f commit cab0925
Showing 1 changed file with 118 additions and 14 deletions.
132 changes: 118 additions & 14 deletions README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,49 @@ You can **run the https://github.com/rust-lang/rust-clippy[Clippy linter]** usin

== Working in Omicron

Omicron is a pretty large repo containing a bunch of related components. The repo is structured as one large Cargo workspace. When you run `cargo build` or `cargo test`, you'll wind up building and testing everything. In practice, people are usually only working on one or two components at a time. You can usually iterate faster by specifying a specific package, like `cargo build -p omicron-nexus`.
Omicron is a pretty large repo containing a bunch of related components. If you just build the whole thing with `cargo build` or `cargo test`, it can take a while, even for incremental builds. Since most people are only working on a few of these components at a time, it's helpful to be know about Cargo's tools for working with individual packages in a workspace.

NOTE: This section assumes you're already familiar with the prerequisites and environment setup needed to do _any_ work on Omicron. See xref:docs/how-to-run-simulated.adoc[] or xref:docs/how-to-run.adoc[] for more on that.

=== Key tips

* Use `cargo check` when you just want to know if your code compiles. It's _much_ faster than `cargo build` or `cargo test`.
* When using Cargo's check/build/test/clippy commands, you can use the `-p PACKAGE` flag to only operate on a specific package. This often saves a lot of time for incremental builds.
* When using Cargo's check/build/clippy commands, use `--all-targets` to make sure you're checking or building the test code, too.

These are explained a bit more below, along with some common pitfalls.

Here's an example workflow. Suppose you're working on some changes to the Nexus database model (`nexus-db-model` package, located at `nexus/db-model` from the root). While you're actively writing and checking code, you might run:

```
cargo check --all-targets
```

_without_ any `-p` flag. Running this incrementally is pretty fast even on the whole workspace. This also uncovers places where your changes have broken code that uses this package. (If you're making big changes, you might not want that right away. In that case, you might choose to use `-p nexus-db-model` here.)

When you're ready to test the changes you've made, start with building and running tests for the most specific package you've changed:

```
cargo test -p nexus-db-model
```

Once that works, check the tests for the next package up:

```
cargo test -p omicron-nexus
```

When you're happy with things and want to make sure you haven't missed something, test everything:

```
cargo test
```

=== Rust packages in Omicron

NOTE: The term "package" is overloaded: most programming languages and operating systems have their own definitions of a package. On top of that, Omicron bundles up components into our own kind of "package" that gets delivered via the install and update systems. These are described in the `package-manifest.toml` file in the root of the repo. In this section, we're just concerned with Rust packages.

NOTE: There's also confusion in the Rust world about the terms https://doc.rust-lang.org/book/ch07-01-packages-and-crates.html["packages" and "crates"]. _Packages_ are the things that have a Cargo.toml file. (Workspaces like Omicron itself have Cargo.toml files, too.) Packages are also the things that you publish to crates.io (confusingly). One package might have a library, a standalone executable binary, several examples, integration tests, etc. that are all compiled individually and produce separate artifacts. These are what Rust calls _crates_. We're generally just concerned with packages here, not crates.

Here are some of the big components in the control plane that live in this repo:

Expand Down Expand Up @@ -89,7 +131,40 @@ Here are some of the big components in the control plane that live in this repo:

For those with access to Oxide RFDs, RFD 61 discusses the organization principles and key components in more detail.

Many of these components themselves are made up of subpackages (e.g., `nexus-db-queries` is under `omicron-nexus`). There are also many more top-level packages than what's mentioned above. These are used for common code, clients, tools, etc. For more, see the Rustdoc for each module. (Where docs are missing or incomplete, please contribute!)
Many of these components themselves are made up of other packages (e.g., `nexus-db-model` is under `omicron-nexus`). There are also many more top-level packages than what's mentioned above. These are used for common code, clients, tools, etc. For more, see the Rustdoc for each module. (Where docs are missing or incomplete, please contribute!)

Use Cargo's `-p PACKAGE` to check/build/test only the package you're working on. Since people are usually only working on one or two components at a time, you can usually iterate faster this way.

=== Why is Cargo rebuilding stuff all the time?

People are often surprised to find Cargo rebuilding stuff that it seems like it's just built, even when the relevant source files haven't changed.

* Say you're iterating on code, running `cargo build -p nexus-db-model` to build just that package. Great, it works. Let's run tests: `cargo test -p nexus-db-model`. Now it's rebuilding some _dependency_ of `nexus-db-model` again?!
* Say you've just run `cargo test -p nexus-db-model`. Now you go run `cargo test -p omicron-nexus`, which uses `nexus-db-model`. You see Cargo building `nexus-db-model` again?!

This usually has to do with the way Cargo selects package https://doc.rust-lang.org/cargo/reference/features.html[features]. These are essentially tags that are used at build time to include specific code or dependencies. For example, the https://docs.rs/serde/latest/serde/[serde] crate defines a feature called https://docs.rs/crate/serde/latest/features["derive"] that controls whether the `Serialize`/`Deserialize` derive macros will be included. Let's look at how this affects builds.

TIP: You can use `cargo tree` to inspect a package's dependencies, including features. This is useful for debugging feature-related build issues.

==== Feature selection differs when building tests

When you run `cargo build -p nexus-db-model`, Cargo looks at all the packages in the depencency tree of `nexus-db-model` and figures out what features it needs for each one. Let's take the `uuid` package as an example. Cargo takes https://doc.rust-lang.org/cargo/reference/features.html#feature-unification[union of the features required by any of the packages that depend on `uuid` in the whole dependency tree of `nexus-db-model`]. Let's say that's just the "v4" feature. Simple enough.

When you then run `cargo test -p nexus-db-model`, it does the same thing. Only this time, it's looking at the `dev-dependencies` tree. `nexus-db-model` 's dev-dependencies might include some other package that depends on `uuid` and requires the "v5" feature. Now, Cargo has to rebuild `uuid` -- and anything else that depends on it.

This is why when using Cargo's check/build/clippy commands, we suggest using `--all-targets`. When you use `cargo build --all-targets`, it builds the tests as well. It's usually not much more time and avoids extra rebuilds when switching back and forth between the default targets and the targets with tests included.

==== Feature selection differs when building different packages

People run into a similar problem when switching packages within Omicron. Once you've got `cargo test -p nexus-db-model` working, you may run `cargo test -p omicron-nexus`, which uses `nexus-db-model`. And you may be surprised to see Cargo rebuilding some common dependency like `uuid`. It's the same as above: we're building a different package now. It has a different (larger) dependency tree. That may result in some crate deep in the dependency tree needing some new feature, causing it and all of its dependents to be rebuilt.

NOTE: https://github.com/rust-lang/cargo/issues/4463[There is interest in changing the way feature selection works in workspaces like Omicron for exactly this reason.] It's been suggested to have an option for Cargo to always look at the features required for all packages in the workspace, rather than just the one you've selected. This could eliminate this particular problem. In the meantime, we mitigate this with heavy use of https://doc.rust-lang.org/cargo/reference/workspaces.html#the-dependencies-table[workspace dependencies], which helps make sure that different packages _within_ Omicron depend on the same set of features for a given dependency.

=== Why am I getting compile errors after I thought I'd already built everything?

Say you're iterating on code, running `cargo build -p nexus-db-model` to build just that package. You work through lots of compiler errors until finally it works. Now you run tests: `cargo test -p nexus-db-model`. Now you see a bunch of compiler errors again! What gives?

By default, Cargo does not operate on the tests. Cargo's check/build/clippy commands ignore them. This is another reason we suggest using `--all-targets` most of the time.

=== Generated Service Clients and Updating

Expand Down Expand Up @@ -118,24 +193,53 @@ documents being checked in.

In general, changes any service API **require the following set of build steps**:

* Make changes to the service API
* Build the package for the modified service alone. This can be done by changing
directories there, or `cargo build -p <package>`. This is step is important,
to avoid the circular dependency at this point. One needs to update this one
OpenAPI document, without rebuilding the other components that depend on a
now-outdated spec.
* Update the OpenAPI document by running the relevant test with overwrite set:
`EXPECTORATE=overwrite cargo test test_nexus_openapi_internal` (changing the
test name as necessary)
* This will cause the generated client to be updated which may break the build
for dependent consumers
* Modify any dependent services to fix calls to the generated client
. Make changes to the service API.
. Update the OpenAPI document by running the relevant test with overwrite set:
`EXPECTORATE=overwrite cargo test -p <package> -- test_nexus_openapi_internal`
(changing the package name and test name as necessary). It's important to do
this _before_ the next step.
. This will cause the generated client to be updated which may break the build
for dependent consumers.
. Modify any dependent services to fix calls to the generated client.

Note that if you make changes to both Nexus and Sled Agent simultaneously, you
may end up in a spot where neither can build and therefore neither OpenAPI
document can be generated. In this case, revert or comment out changes in one
so that the OpenAPI document can be generated.

This is a particular problem if you find yourself resolving merge conflicts in the generated files. You have basically two options for this:

* Resolve the merge conflicts by hand. This is usually not too bad in practice.
* Take the upstream copy of the file, back out your client side changes (`git stash` and its `-p` option can be helpful for this), follow the steps above to regenerate the file using the automated test, and finally re-apply your changes to the client side. This is essentially getting yourself back to step 1 above and then following the procedure above.

=== Resolving merge conflicts in Cargo.lock

When pulling in new changes from upstream "main", you may find conflicts in Cargo.lock. The easiest way to deal with these is usually to take the upstream changes as-is, then trigger any Cargo operation that updates the lockfile. `cargo metadata` is a quick one. Here's an example:

```
# Pull in changes from upstream "main"
$ git fetch
$ git merge origin/main

# Oh no! We've got conflicts in Cargo.lock. First, let's just take what's upstream:
$ git show origin/main:Cargo.lock > Cargo.lock

# Now, run any command that causes Cargo to update the lock file as needed.
$ cargo metadata > /dev/null
```

When you do this, Cargo makes only changes to Cargo.lock that are necessary based on the various Cargo.toml files in the workspace and dependencies.

Here are things you _don't_ want to do to resolve this conflict:

* Run `cargo generate-lockfile` to generate a new lock file from scratch.
* Remove `Cargo.lock` and let Cargo regenerate it from scratch.

Both of these will cause Cargo to make many more changes (relative to "main") than necessary because it's choosing the latest version of all dependencies in the whole tree. You'll be inadvertently updating all of Omicron's transitive dependencies. (You might conceivably want that. But usually we update dependencies either as-needed for a particular change or via individual PRs via dependabot, not all at once because someone had to merge Cargo.lock.)

You can also resolve conflicts by hand. It's tedious and error-prone.


== Configuration reference

`nexus` requires a TOML configuration file. There's an example in
Expand Down

0 comments on commit cab0925

Please sign in to comment.