Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More crutest and dsc updates to support Volume layer activities. #1472

Merged
merged 4 commits into from
Sep 26, 2024

Conversation

leftwo
Copy link
Contributor

@leftwo leftwo commented Sep 25, 2024

This adds support for crutest to use a provided dsc endpoint to construct a Volume object.

These changes should be a fix for both issues:
#1451 and #1457

Moved the existing volume creation steps to a new function, and added another option on how we can create a Volume. The two previous ways of creating a volume are not changed (though I changed a log message and added some warnings). The new code is in taking the dsc provided endpoint and using that to construct a volume.

Additional dsc changes were made to help provide Volume info.
Renamed things in dsc to better reflect what information they hold. Specifically, update a bunch of region set comments, as dsc just controls crucible-downstairs processes, and does not know which ones are part of what region set.

New dsc commands

  • get_ds_uuid: Returns the UUID for the given client ID
  • all_running: Returns true if all downstairs that dsc knows about are currently in Running state.
  • get_region_count: Returns the total number of regions that dsc knows about.

New dsc behavior

Other changes
tools/test_replay.sh transitioned to using the new --dsc option, as that test already required a dsc endpoint and was using a hard coded default value for it.

tools/test_restart_repair.sh was updated to wait for dsc to report that all downstairs are online after a restart. This avoids a race where we told dsc to start, and then start crutest, but the downstairs are not yet online.

All the tests that use dsc will eventually transition to using it to construct a Volume, but I'm pushing that work to another PR.

There are more changes coming, specifically

Other work this enables

  • Making tests that use multiple sub-volumes.
  • Layering the current set of tests in a way so we can run the same tests with a single sub-volume and with multiple sub-volumes

These changes should be a fix for both issues:
#1451
#1457

This adds support for crutest to use a provided dsc endpoint to construct
a Volume object.

Moved the existing volume creation steps to a new function, and added another
option on how we can create a Volume.  The two previous ways of creating
a volume are not changed (though I changed a log message and added some
warnings).  The new code is in taking the dsc provided endpoint and
using that to construct a volume.

Additional dsc changes were made to help provide Volume info.

Renamed things in dsc to better reflect what information they hold.
Specifically, update a bunch of region set comments, as dsc just controls
crucible-downstairs processes, and does not know which ones are part of
what region set.

New dsc commands:
get_ds_uuid: Returns the UUID for the given client ID.
all_running: Returns true if all downstairs that dsc knows about are currently
in Running state.
get_region_count: Returns the total number of regions that dsc knows about.

New dsc behavior.
dsc will now wait on all downstairs starting before taking any commands.
The ability for dsc to answer a request can be used by a test to confirm that
all downstairs had started.
Add the ability to supply a dsc endpoint to crutest-cli
#1459

tools/test_replay.sh transitioned to using the new --dsc option, as that
test already required a dsc endpoint and was using a hard coded default
value for it.

tools/test_restart_repair.sh was updated to wait for dsc to report that
all downstairs are online after a restart.  This avoids a race where
we told dsc to start, and then start crutest, but the downstairs are not
yet online.

All the tests that use dsc will eventually transition to using it to
construct a Volume, but I'm pushing that work to another PR.

There are more changes coming, specifically:
Updating replace-before-acive and replace-reconcile to use the new
dsc option correctly instead of a default.
Updating tools/test_* to not require targets for crutest and instead use
dsc option.
More updates in crutest to update the current RegionInfo struct to be
aware of multiple sub-volumes.
#1454

Possibly some updates to the BlockIO trait.
#1455

Other work this enables
Making tests that use multiple sub-volumes.
Layering the current set of tests in a way so we can run the same tests

with a single sub-volume and with multiple sub-volumes
} else {
println!("dsc: Need to be connected first");
match dsc_cmd {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you write this as

} else if let DscCommand::Connect { server } = dsc_cmd {

instead of the nested match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's much better. Done in 11fb248

@@ -139,7 +137,7 @@ enum Workload {
ReplaceBeforeActive {
/// URL location of the running dsc server
#[clap(long, default_value = "http://127.0.0.1:9998", action)]
dsc: String,
dsc_str: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the new global dsc argument here (and in ReplaceReconcile)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can and will! Coming in my next PR.
It also requires some changes to the tests themselves, and I felt this PR had enough in it.

if !opt.target.is_empty() {
warn!(test_log, "targets are ignored when VCR is provided");
}
let volume = Volume::construct(vcr, pr, block_io_logger).await.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we warn or error if dsc is provided here?

Personally, I would vote for errors (including for the lines that are warnings above), since invalid CLI arguments are probably a sign that the user is doing something wrong (and no one reads the logs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are not quite ready to make these hard errors yet, though you are right that no one reads the logs, unless there is a failure. This could lead to someone thinking they are testing one thing, when in fact they are testing another, which is not good.

My next PR (yes, I'm using that again) has more changes that will get us closer to not
needing both targets and a VCR file. The

There are some cases where we use the targets provided for building test specific args. The ReplaceBeforeActivetest you pointed to above is one of these cases.

I've made an issue to come back and harden up the errors here.

For dsc specifically, we could, in theory, create a volume from a VCR file, then use DSC to connect and control the downstairs themselves. I'm trying to figure out how we will support having sub-volumes with different sizes and how to ingest that into crutest, and I was able to do that by layering a few dsc create commands in a way that did it.
Another idea is to support multiple dsc endopints. I'm still not sure which way is better.

let res = dsc_client.dsc_get_region_count().await.unwrap();
let regions = res.into_inner();

let sv_count = regions / 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error if sv_count is 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 11fb248

let mut extent_info_result = None;
for target in &crucible_opts.target {
let port = target.port() + crucible_common::REPAIR_PORT_OFFSET;
println!("look at: http://{}:{} ", target.ip(), port);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info!(test_log, ...) instead of println!?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot I had a logger here (or possibly when I wrote this before I moved it)
Fixed in 11fb248

let repair_client = repair_client::new(&repair_url);
match repair_client.get_region_info().await {
Ok(ri) => {
println!("RI is: {:?}", ri);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 11fb248

break;
}
Err(e) => {
println!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (warn! perhaps?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I consider myself warned in 11fb248

Copy link
Contributor

@mkeeter mkeeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small comments, but generally LGTM

@leftwo
Copy link
Contributor Author

leftwo commented Sep 26, 2024

If anyone else want's to comment (or has comments in progress) let me know, otherwise I'll merge this
when the CI tests pass.

@leftwo leftwo merged commit 826e820 into main Sep 26, 2024
18 checks passed
@leftwo leftwo deleted the alan/crutest-meet-dsc branch September 26, 2024 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

crutest cli does not let you change from the default dsc server
2 participants