-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace a downstairs via a new VCR #863
Conversation
Made each field of the VolumeConstructionRequest enum a standalone structure of itself. Added a volume layer VCR target (downstairs) replacement method. This will validate an old/new VCR and then call the underlying replace_downstairs method that exists now. Created a bunch of new tests to validate VCR differences are noticed.
Notes to self: |
Answer to self: We don't need to update self. The Volume holds a Note that whomever calls this VCR update, must use the new VCR going forward. |
upstairs/src/volume.rs
Outdated
VolumeConstructionRequest::Url { .. } => { | ||
crucible_bail!( | ||
ReplaceRequestInvalid, | ||
"Replacement URL VCR invalid " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix
upstairs/src/volume.rs
Outdated
// | ||
// On success, we record the old and new downstairs target and return | ||
// those to the caller. | ||
pub async fn compare_vcr_for_replacement( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A high-level thought: would it make sense to use this during live migration to ensure that two Crucible backends have been configured compatibly? (You'd have the old VCR on the source and the new VCR on the target; the source would send its VCR to the target; the target could then verify that its VCR is plausibly compatible with the one on the source.) We'd have a little work to do in Propolis to enable this but it's nothing complicated.
The reason I ask is that if this could be used this way, it may be nicer to phrase the routine name and error messages throughout in terms of "compatibility" instead of "replaceability."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be a more generic check, with a little more polish.
For a migration, the generation on the new side should be > than the old, which we check here. For a migration, the read only parent could be None
on the new side, which we check here. I'm assuming we don't want to migrate and replace a downstairs at the same time (at least initailly).
To make this generic, it would have to differentiate between the "okay for migration" VCR and "ok for downstairs replacement" VCR.
I think a way to do that would be to return Ok(None)
if the VCRs are the same (given the above constraints) and 'Ok(Some(old_target, new_target))` if the VCRs are the same with the exception of one target being different.
How does that sound?
&self, | ||
original: VolumeConstructionRequest, | ||
replacement: VolumeConstructionRequest, | ||
log: &Logger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the interface Propolis is going to call? It looks great, except that we don't have a logger handy in the CrucibleBackend
, because the backend doesn't hang onto the logger that it passed down to the upstairs in Volume::construct
. Can the top-level Volume hold onto that logger in addition to passing it down to the subvolumes so that the caller doesn't have to construct a new one here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this interface was what I was hoping Propolis could use.
I've wanted a few times to add a Logger
field to the volume layer, but so far have not done so. Maybe this is the time to do that. Let me see if I can figure out how much work it is to do that. I think since we have updated Propolis to supply a logger at Volume::construct
, crucible can attach it there. It's really all the test code I'll have to change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments!
I've tried to answer them, and I'm going to work on some suggestions you had and I'll push up changes when I have that work done.
upstairs/src/volume.rs
Outdated
// | ||
// On success, we record the old and new downstairs target and return | ||
// those to the caller. | ||
pub async fn compare_vcr_for_replacement( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be a more generic check, with a little more polish.
For a migration, the generation on the new side should be > than the old, which we check here. For a migration, the read only parent could be None
on the new side, which we check here. I'm assuming we don't want to migrate and replace a downstairs at the same time (at least initailly).
To make this generic, it would have to differentiate between the "okay for migration" VCR and "ok for downstairs replacement" VCR.
I think a way to do that would be to return Ok(None)
if the VCRs are the same (given the above constraints) and 'Ok(Some(old_target, new_target))` if the VCRs are the same with the exception of one target being different.
How does that sound?
upstairs/src/volume.rs
Outdated
VolumeConstructionRequest::Url { .. } => { | ||
crucible_bail!( | ||
ReplaceRequestInvalid, | ||
"Replacement URL VCR invalid " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix
&self, | ||
original: VolumeConstructionRequest, | ||
replacement: VolumeConstructionRequest, | ||
log: &Logger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this interface was what I was hoping Propolis could use.
I've wanted a few times to add a Logger
field to the volume layer, but so far have not done so. Maybe this is the time to do that. Let me see if I can figure out how much work it is to do that. I think since we have updated Propolis to supply a logger at Volume::construct
, crucible can attach it there. It's really all the test code I'll have to change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've addressed all comments, this is ready for a re-review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, sorry for inciting all those logging changes... One more suggestion about the comparison routines, but otherwise this is looking pretty good.
// On success, we record the old and new downstairs target and return | ||
// those to the caller. | ||
pub async fn compare_vcr_for_replacement( | ||
// We return Ok(None) if requirements 1, 2, 3A, 3B are met. This would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me, since we only need to return a socket address pair if we're replacing, but it feels a little weird to me to have the caller check for the flavor of Ok
it got back when it could have specified to the callee which checks to do. That is, I think the downstairs-replacement path is going to end up with something like this:
match compare_vcr_for_update(original, replacement, log)? {
Ok(Some(_)) => todo!("okay to proceed"),
Ok(None) => bail!("target replacement check failed"),
}
Might it make sense to have two functions? compare_vcr_for_update
checks 1, 2, 3A, and 3B and returns a Result<()>
; migration calls that one. compare_vcr_for_replacement
calls compare_vcr_for_update
and, if that succeeds, does check 3C, and if that succeeds it returns Result<(SocketAddr, SocketAddr)>
. Then the downstairs replacement code can just let (old_addr, new_addr) = compare_vcr_for_replacement()?;
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The challenge here is that, and perhaps I need to make it clear in the comment as well, is:
- For migration, I don't want 3C, the targets must be the same.
- For replacement, I do want 3C, the targets must be different.
The compare_vcr_for_replacement
does not build on top of compare_vcr_for_migration
I think I can do it with a wrapper in front of compare_vcr_for_update
, where I just hide the Ok
check like you have shown above and have a migration_check
and a replacement_check
,
migration_check
just looks for Ok(None)
, and replacement_check
wants the Ok(Some(_))
Then I'm at least sweeping the Ok
check under the rug and the callers don't have to match.
I could also make a "check everything except the sub_volume
targets" function (what is common between the replace/migrate cases). If I did that I would have to make another two functions that then did the translation/selection from VolumeConstructionRequest
-> VcrVolume
-> SubVolume
-> Target Vec, and returned the Ok/Err depending on what they wanted, which would then look something like this:
fn compare_vcr_for_migration(..) -> Result<()> {
compare_vcr_common(..)?
compare_vcr_subtarget_the_same()
}
and
fn compare_vcr_for_migration(..) -> Result<(SocketAddr, SocketAddr)> {
compare_vcr_common(..)?
compare_vcr_subtarget_one_different()
}
Or, maybe I make a give_me_the_subvolume_target_from_this_vcr
function, then compare the two targets after I have done the compare_vcr_common
, but.. I don't know what ends up being the better solution here. I kind of like just hiding the Ok check.
Let me know if you feel strongly about either of these two solutions (or some third idea). I do think it's worth updating this to be a little better and having two functions to call depending on what the caller wants, but I need a little direction on which path you think might be better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and tried option A (the wrapper around compare_vcr_for_update
)
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it!
This adds support for replacing a crucible downstairs from the Volume layer by submitting a current and
new
VolumeConstructionRequest
(VCR). If the new VCR matches the current with only a few allowed differences(gen, a single new target) then the guest layer
replace_downstairs
is called with the old and new downstairs targets.To make this code a little more readable, new structure types were created for each of the enums in the
VolumeConstructionRequest
, and theVolumeConstructionRequest
now uses the new types.These changes can be seen in
crucible-client-types/src/lib.rs
, and that's probably a good place to start.A bunch of new tests were created to verify the old/new VCRs only differ in the required ways, and any other
differences are blocked.
The meat of the changes here are the new
compare_vcr_for_replacement()
call that compares two VCRs and, if itfinds them to match enough, returns the old and new target downstairs. This can then be passed on to the
existing
replace_downstairs()
call that will tell the upstairs to perform the replacement.There is a fair chunk of code that is just testing all the possible differences that two VCRs might have, which is sadly
like 85% the same for each test. I'm happy to take suggestions on how to turn that into less code if possible, but
trying to keep a balance such that it is pretty clear as to what the test was doing.