Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(expansion): add an error message to guide users in resolving the stuck volume expansion cancellation #3168

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

derekbit
Copy link
Member

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9466

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

…volume expansion cancellation

Longhorn 9466

Signed-off-by: Derek Su <[email protected]>
@@ -574,6 +568,17 @@ func (m *VolumeManager) CancelExpansion(volumeName string) (v *longhorn.Volume,
engine = e
}

if !v.Status.ExpansionRequired {
if v.Spec.Size == engine.Spec.VolumeSize && v.Spec.Size != engine.Status.CurrentSize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible?

The intermediate state that causes the bug is:

  1. The expansion is already started and finished in the engine process level, which means v.Spec.Size == engine.Spec.VolumeSize.
  2. Longhorn has not updated engine/volume CR status. In other words, v.Spec.Size != engine.Status.CurrentSize and v.Status.ExpansionRequired == true. (I am not sure if engine.Status.IsExpanding can be false here)

To be honest, it is hard to capture this state then reject the requests in API layer.

Copy link
Member Author

@derekbit derekbit Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right.

It is caused by it as I mentioned in longhorn/longhorn#9466 (comment).

Do you think the error message (hint how to resolve the issue) is reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message itself is reasonable. But my question is, how can v.Status.ExpansionRequired become false (or can the new condition be satisfied) before VolumeManager.CancelExpansion execution complete?
The only place of adding the hint is probably here. And currently, Longhorn has no way to figure out if there is an cancellation before this error is trigger, the error hint can be only like:

the expected size %v of engine %v should not be smaller than the current size %v. It can be caused by a cancellation request after the expansion is complete, please re-do the expansion to resolve this error.

Copy link
Member

@innobead innobead Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a general problem for canceling any async operations relying on the resource status which is regularly updated but not in real time.

Is it possible to get the runtime info from the engine directly when calling the cancellation to decide whether the volume is in expansion or not?

Copy link
Contributor

@shuo-wu shuo-wu Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get the runtime info from the engine directly when calling the cancellation to decide whether the volume is in expansion

It is not an atomic operation hence there can be a race anyway. For now, I think adding warnings/notifications is enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants