-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Service Brokers to indicate the state of a Service Instance after a failed update or deprovisioning #637
base: master
Are you sure you want to change the base?
Conversation
…er a failed update or deprovisioning
✅ Hey fmui! The commit authors and yourself have already signed the CLA. |
spec.md
Outdated
@@ -328,6 +328,9 @@ For error responses, the following fields are defined: | |||
| --- | --- | --- | | |||
| error | string | A single word in camel case that uniquely identifies the error condition. If present, MUST be a non-empty string. | | |||
| description | string | A user-facing error message explaining why the request failed. If present, MUST be a non-empty string. | | |||
| instance_usable | boolean | If an update or deprovisioning operation failed, this flag indicates whether or not the Service Instance is still usable. If `true`, the Service Instance can still be used, `false` otherwise. This field MUST NOT be present for errors of other operations. Defaults to true. | | |||
| update_repeatable | boolean | If an update operation failed, this flag indicates whether this update can be repeated or not. If `true`, the same update operation MAY be repeated and MAY succeed; if `false`, repeating the same update operation will fail again. This field MUST NOT be present for errors of other operations. Defaults to true. | | |||
| retry_delay | integer | This field suggests how long (in seconds) the Platform SHOULD wait until it repeats the operation. If this a negative number, the Platform SHOULD NOT automatically repeat the operation. Defaults to 0 seconds. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a retry after
concept in the last operation
endpoint. Do you think it makes sense to use the same name for this as well, or do you think it needs to be distinct?
@@ -772,15 +775,21 @@ For success responses, the following fields are defined: | |||
| Response Field | Type | Description | | |||
| --- | --- | --- | | |||
| state* | string | Valid values are `in progress`, `succeeded`, and `failed`. While `"state": "in progress"`, the Platform SHOULD continue polling. A response with `"state": "succeeded"` or `"state": "failed"` MUST cause the Platform to cease polling. | | |||
| description | string | A user-facing message that can be used to tell the user details about the status of the operation. If present, MUST be a non-empty string. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description field appears twice in this section. Conflict resolution artifact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good to me except for the duplication of description
field question.
About retry delay
vs retry after
I don’t have a strong opinion.
spec.md
Outdated
| description | string | A user-facing message that can be used to tell the user details about the status of the operation. If present, MUST be a non-empty string. | | ||
| instance_usable | boolean | If an update or deprovisioning operation failed, this flag indicates whether or not the Service Instance is still usable. If `true`, the Service Instance can still be used, `false` otherwise. This field MUST NOT be present for errors of other operations. Defaults to true. | | ||
| update_repeatable | boolean | If an update operation failed, this flag indicates whether this update can be repeated or not. If `true`, the same update operation MAY be repeated and MAY succeed; if `false`, repeating the same update operation will fail again. This field MUST NOT be present for errors of other operations. Defaults to true. | | ||
| retry_delay | integer | If an operation failed, this field suggests how long (in seconds) the Platform SHOULD wait until it repeats the operation. If this a negative number, the Platform SHOULD NOT automatically repeat the operation. Defaults to 0 seconds. | | ||
| description | string | A user-facing message that can be used to tell the user details about the status of the operation. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate field here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@@ -328,6 +328,9 @@ For error responses, the following fields are defined: | |||
| --- | --- | --- | | |||
| error | string | A single word in camel case that uniquely identifies the error condition. If present, MUST be a non-empty string. | | |||
| description | string | A user-facing error message explaining why the request failed. If present, MUST be a non-empty string. | | |||
| instance_usable | boolean | If an update or deprovisioning operation failed, this flag indicates whether or not the Service Instance is still usable. If `true`, the Service Instance can still be used, `false` otherwise. This field MUST NOT be present for errors of other operations. Defaults to true. | | |||
| update_repeatable | boolean | If an update operation failed, this flag indicates whether this update can be repeated or not. If `true`, the same update operation MAY be repeated and MAY succeed; if `false`, repeating the same update operation will fail again. This field MUST NOT be present for errors of other operations. Defaults to true. | | |||
| retry_after | integer | This field suggests how long (in seconds) the Platform SHOULD wait until it repeats the operation. If this a negative number, the Platform SHOULD NOT automatically repeat the operation. Defaults to 0 seconds. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not getting the negative number bit for retry_after
, is that indicating the same thing as update_repeatable
is false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a difference. update_repeatable
indicates wether a repeated update has a chance to succeed or not. A negative retry_after
value indicates that even if a subsequent update could succeed, the platform should not automatically retry it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I find the concept about embedding some payload in an error, apart from the code and a message, a bit odd.
For me there is either an error or not. Making some kind of interpretation about whether a thing is more or less an error is error prone itself.
| description | string | A user-facing message that can be used to tell the user details about the status of the operation. If present, MUST be a non-empty string. | | ||
| instance_usable | boolean | If an update or deprovisioning operation failed, this flag indicates whether or not the Service Instance is still usable. If `true`, the Service Instance can still be used, `false` otherwise. This field MUST NOT be present for errors of other operations. Defaults to true. | | ||
| update_repeatable | boolean | If an update operation failed, this flag indicates whether this update can be repeated or not. If `true`, the same update operation MAY be repeated and MAY succeed; if `false`, repeating the same update operation will fail again. This field MUST NOT be present for errors of other operations. Defaults to true. | | ||
| retry_after | integer | If an operation failed, this field suggests how long (in seconds) the Platform SHOULD wait until it repeats the operation. If this a negative number, the Platform SHOULD NOT automatically repeat the operation. Defaults to 0 seconds. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we have retry_after
as a new field in the body and not using the Retry-After
header as per #621?
Maybe the second sentence should be If this is a negative number...
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@georgi-lozev the retry_after
field if for when the update operation itself has to be retried whereas the Retry-After
is to retry the last operation polling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I get the idea.
My question is why we won't re-use the same mechanism with the header to retry the update operation, but we introduce a new one, a field that is part of the body.
To me it looks like we can return a Retry-After
header as part of the update operation to instruct the platform retrying after a given interval, same way as in last operation polling.
I don't think it's a showstopper, it's more kind of a nitpicking and aiming for some kind of consistency between implementing similar behaviors in the spec.
@georgi-lozev I agree. The right approach would be to return these states in the response of the fetch endpoint. But there are brokers that cannot provide a fetch endpoint. The only way I can think of that works for all brokers to transport this information is the error response. Do you have a better idea? |
This seems reasonable from a k8s perspective. We'd have to plumb this through to our controller, but most of these values already have corresponding values in our reconciliation logic that these could be plugged into |
@fmui Do we still need this PR? |
When an instance update or delete fails, the platform doesn't know the state of the instance and doesn't know if the operation can be repeated.
This PR allows brokers to provide this additional information to the platform in case of a failure.
This PR replaces PR #570.