Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: one ensure API to rule them all #813

Merged
merged 3 commits into from
Dec 18, 2024
Merged

Conversation

gjcolombo
Copy link
Contributor

Stacked on #810. Part of #804.

Remove the existing instance ensure APIs and collapse them into a single API that takes a set of instance properties and a mechanism for creating the instance, which can be either creating a new VM from a spec or initializing as a live migration target. In the latter case, the client can also supply a list of instance spec components that the client wishes to override the components in the source's spec. This gives Omicron a way to replace backend information (Crucible generation number, host VNIC names) that changes over live migration.

Remove all of the instance ensure types that supported the old ensure interface. One notable consequence of this is that the DiskRequest type also caused propolis-server's versions of Crucible's VolumeConstructionRequest and CrucibleOpts types to appear in the generated client. Omicron code tends to pull these types in from propolis-client (often transitively through sled-agent-client). This is a useful way of making sure Omicron components serialize/deserialize VCRs the same way Propolis does, so preserve it by re-exporting these types from propolis-client.

Remove the parts of the mock server that validated disk, NIC, and cloud-init volume requests. The Omicron tests that use the mock server primarily check that its serial console interface works as expected and don't heavily exercise these checks. (Omicron CI's end-to-end test will fail if sled-agent mishandles the Propolis ensure API.)

Finally, remove a few other unused types from the propolis_api_types crate.

Tests: cargo test, PHD, driving ad hoc VMs with propolis-cli.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a lot more straightforward than I thought it was gonna be, looks good!

crates/propolis-api-types/src/lib.rs Show resolved Hide resolved
@gjcolombo gjcolombo force-pushed the gjcolombo/death-of-a-config-file branch from 986264a to 8c0f774 Compare December 2, 2024 20:55
@gjcolombo gjcolombo force-pushed the gjcolombo/one-ensure-api branch from 91023e4 to d8967c7 Compare December 2, 2024 22:13
@gjcolombo gjcolombo force-pushed the gjcolombo/death-of-a-config-file branch from 8c0f774 to fb4b081 Compare December 17, 2024 18:43
Base automatically changed from gjcolombo/death-of-a-config-file to master December 18, 2024 00:06
Remove the existing instance ensure APIs and collapse them into a single
API that takes a set of instance properties and a mechanism for creating
the instance, which can be either creating a new VM from a spec or
initializing as a live migration target. In the latter case, the client
can also supply a list of instance spec components that the client
wishes to override the components in the source's spec. This gives
Omicron a way to replace backend information (Crucible generation
number, host VNIC names) that changes over live migration.

Remove all of the instance ensure types that supported the old ensure
interface. One notable consequence of this is that the `DiskRequest`
type also caused propolis-server's versions of Crucible's
`VolumeConstructionRequest` and `CrucibleOpts` types to appear in the
generated client. Omicron code tends to pull these types in from
propolis-client (often transitively through sled-agent-client). This is
a useful way of making sure Omicron components serialize/deserialize
VCRs the same way Propolis does, so preserve it by re-exporting these
types from propolis-client.

Remove the parts of the mock server that validated disk, NIC, and
cloud-init volume requests. The Omicron tests that use the mock server
primarily check that its serial console interface works as expected and
don't heavily exercise these checks. (Omicron CI's end-to-end test will
fail if sled-agent mishandles the Propolis ensure API.)

Finally, remove a few other unused types from the propolis_api_types
crate.

Tests: cargo test, PHD, driving ad hoc VMs with propolis-cli.
@gjcolombo gjcolombo force-pushed the gjcolombo/one-ensure-api branch from a92ffd7 to 7a9bbc0 Compare December 18, 2024 00:13
Comment on lines 64 to 73
/// - Crucible disks: After migrating, the target Propolis presents itself as a
/// new client of the Crucible downstairs servers backing the VM's disks.
/// Crucible requires the target to present a newer client generation number
/// to allow the target to connect. In a full Oxide deployment, these numbers
/// are managed by the control plane (i.e. it is not safe for Propolis to
/// manage these values directly--new Crucible volume connection information
/// must always come from Nexus).
/// - Virtio network devices: Each virtio NIC in the guest needs to bind to a
/// named VNIC object on the host. These names can change when a VM migrates
/// from host to host.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is lovely!

@gjcolombo gjcolombo merged commit 2216a2b into master Dec 18, 2024
10 of 11 checks passed
@gjcolombo gjcolombo deleted the gjcolombo/one-ensure-api branch December 18, 2024 00:50
gjcolombo added a commit to oxidecomputer/omicron that referenced this pull request Dec 20, 2024
Update Omicron to use the new Propolis VM creation API defined in
oxidecomputer/propolis#813 and oxidecomputer/propolis#816. This API
requires clients to pass instance specs to create new VMs and component
replacement lists to migrate existing VMs. Construct these in sled agent
for now; in the future this logic can move to Nexus and become part of a
robust virtual platform abstraction. For now the goal is just to keep
everything working for existing VMs while adapting to the new Propolis
API.

Slightly adjust the sled agent instance APIs so that Nexus specifies
disks and boot orderings using sled-agent-defined types and not
re-exported Propolis types.

Finally, adapt Nexus to the fact that Crucible's
`VolumeConstructionRequest` and `CrucibleOpts` types no longer appear in
Propolis's generated client (and so don't appear in sled agent's client
either). Instead, the `propolis-client` crate re-exports these types
directly from its `crucible-client-types` dependency. For the most part,
this involves updating `use` directives and storing `SocketAddr`s in
their natively typed form instead of converting them to and from
strings.

Tests: cargo nextest, plus ad hoc testing in a dev cluster as described
in the PR comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants