-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(expansion): add an error message to guide users in resolving the stuck volume expansion cancellation #3168
base: master
Are you sure you want to change the base?
Conversation
8173d64
to
d1f2e42
Compare
…volume expansion cancellation Longhorn 9466 Signed-off-by: Derek Su <[email protected]>
d1f2e42
to
0c05794
Compare
@@ -574,6 +568,17 @@ func (m *VolumeManager) CancelExpansion(volumeName string) (v *longhorn.Volume, | |||
engine = e | |||
} | |||
|
|||
if !v.Status.ExpansionRequired { | |||
if v.Spec.Size == engine.Spec.VolumeSize && v.Spec.Size != engine.Status.CurrentSize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible?
The intermediate state that causes the bug is:
- The expansion is already started and finished in the engine process level, which means
v.Spec.Size == engine.Spec.VolumeSize
. - Longhorn has not updated engine/volume CR status. In other words,
v.Spec.Size != engine.Status.CurrentSize
andv.Status.ExpansionRequired == true
. (I am not sure ifengine.Status.IsExpanding
can be false here)
To be honest, it is hard to capture this state then reject the requests in API layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right.
It is caused by it as I mentioned in longhorn/longhorn#9466 (comment).
Do you think the error message (hint how to resolve the issue) is reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message itself is reasonable. But my question is, how can v.Status.ExpansionRequired
become false (or can the new condition be satisfied) before VolumeManager.CancelExpansion
execution complete?
The only place of adding the hint is probably here. And currently, Longhorn has no way to figure out if there is an cancellation before this error is trigger, the error hint can be only like:
the expected size %v of engine %v should not be smaller than the current size %v. It can be caused by a cancellation request after the expansion is complete, please re-do the expansion to resolve this error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a general problem for canceling any async operations relying on the resource status which is regularly updated but not in real time.
Is it possible to get the runtime info from the engine directly when calling the cancellation to decide whether the volume is in expansion or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get the runtime info from the engine directly when calling the cancellation to decide whether the volume is in expansion
It is not an atomic operation hence there can be a race anyway. For now, I think adding warnings/notifications is enough.
Which issue(s) this PR fixes:
Issue longhorn/longhorn#9466
What this PR does / why we need it:
Special notes for your reviewer:
Additional documentation or context