Allow Service Brokers to indicate the state of a Service Instance after a failed update or deprovisioning #637

fmui · 2019-02-18T13:18:54Z

When an instance update or delete fails, the platform doesn't know the state of the instance and doesn't know if the operation can be repeated.
This PR allows brokers to provide this additional information to the platform in case of a failure.

This PR replaces PR #570.

Update

…er a failed update or deprovisioning

cfdreddbot · 2019-02-18T13:18:56Z

✅ Hey fmui! The commit authors and yourself have already signed the CLA.

waterlink · 2019-03-07T13:05:49Z

spec.md

@@ -328,6 +328,9 @@ For error responses, the following fields are defined:
 | --- | --- | --- |
 | error | string | A single word in camel case that uniquely identifies the error condition. If present, MUST be a non-empty string. |
 | description | string | A user-facing error message explaining why the request failed. If present, MUST be a non-empty string. |
+| instance_usable | boolean | If an update or deprovisioning operation failed, this flag indicates whether or not the Service Instance is still usable. If `true`, the Service Instance can still be used, `false` otherwise. This field MUST NOT be present for errors of other operations. Defaults to true. |
+| update_repeatable | boolean | If an update operation failed, this flag indicates whether this update can be repeated or not. If `true`, the same update operation MAY be repeated and MAY succeed; if `false`, repeating the same update operation will fail again. This field MUST NOT be present for errors of other operations. Defaults to true. |
+| retry_delay | integer | This field suggests how long (in seconds) the Platform SHOULD wait until it repeats the operation. If this a negative number, the Platform SHOULD NOT automatically repeat the operation. Defaults to 0 seconds. |


We already have a retry after concept in the last operation endpoint. Do you think it makes sense to use the same name for this as well, or do you think it needs to be distinct?

waterlink · 2019-03-07T13:11:09Z

spec.md

@@ -772,15 +775,21 @@ For success responses, the following fields are defined:
 | Response Field | Type | Description |
 | --- | --- | --- |
 | state* | string | Valid values are `in progress`, `succeeded`, and `failed`. While `"state": "in progress"`, the Platform SHOULD continue polling. A response with `"state": "succeeded"` or `"state": "failed"` MUST cause the Platform to cease polling. |
+| description | string | A user-facing message that can be used to tell the user details about the status of the operation. If present, MUST be a non-empty string. |


The description field appears twice in this section. Conflict resolution artifact?

Thanks. Fixed.

Great! Thank you!

waterlink

Everything looks good to me except for the duplication of description field question.

About retry delay vs retry after I don’t have a strong opinion.

waterlink · 2019-03-07T17:07:48Z

spec.md

+| description | string | A user-facing message that can be used to tell the user details about the status of the operation. If present, MUST be a non-empty string. |
+| instance_usable | boolean | If an update or deprovisioning operation failed, this flag indicates whether or not the Service Instance is still usable. If `true`, the Service Instance can still be used, `false` otherwise. This field MUST NOT be present for errors of other operations. Defaults to true. |
+| update_repeatable | boolean | If an update operation failed, this flag indicates whether this update can be repeated or not. If `true`, the same update operation MAY be repeated and MAY succeed; if `false`, repeating the same update operation will fail again. This field MUST NOT be present for errors of other operations. Defaults to true. |
+| retry_delay | integer | If an operation failed, this field suggests how long (in seconds) the Platform SHOULD wait until it repeats the operation. If this a negative number, the Platform SHOULD NOT automatically repeat the operation. Defaults to 0 seconds. |
 | description | string | A user-facing message that can be used to tell the user details about the status of the operation. |


Duplicate field here.

waterlink

LGTM 👍

tinygrasshopper · 2019-03-12T10:08:58Z

spec.md

@@ -328,6 +328,9 @@ For error responses, the following fields are defined:
 | --- | --- | --- |
 | error | string | A single word in camel case that uniquely identifies the error condition. If present, MUST be a non-empty string. |
 | description | string | A user-facing error message explaining why the request failed. If present, MUST be a non-empty string. |
+| instance_usable | boolean | If an update or deprovisioning operation failed, this flag indicates whether or not the Service Instance is still usable. If `true`, the Service Instance can still be used, `false` otherwise. This field MUST NOT be present for errors of other operations. Defaults to true. |
+| update_repeatable | boolean | If an update operation failed, this flag indicates whether this update can be repeated or not. If `true`, the same update operation MAY be repeated and MAY succeed; if `false`, repeating the same update operation will fail again. This field MUST NOT be present for errors of other operations. Defaults to true. |
+| retry_after | integer | This field suggests how long (in seconds) the Platform SHOULD wait until it repeats the operation. If this a negative number, the Platform SHOULD NOT automatically repeat the operation. Defaults to 0 seconds. |


I am not getting the negative number bit for retry_after, is that indicating the same thing as update_repeatable is false?

There is a difference. update_repeatable indicates wether a repeated update has a chance to succeed or not. A negative retry_after value indicates that even if a subsequent update could succeed, the platform should not automatically retry it.

georgi-lozev

In general I find the concept about embedding some payload in an error, apart from the code and a message, a bit odd.
For me there is either an error or not. Making some kind of interpretation about whether a thing is more or less an error is error prone itself.

georgi-lozev · 2019-03-12T12:54:45Z

spec.md

+| description | string | A user-facing message that can be used to tell the user details about the status of the operation. If present, MUST be a non-empty string. |
+| instance_usable | boolean | If an update or deprovisioning operation failed, this flag indicates whether or not the Service Instance is still usable. If `true`, the Service Instance can still be used, `false` otherwise. This field MUST NOT be present for errors of other operations. Defaults to true. |
+| update_repeatable | boolean | If an update operation failed, this flag indicates whether this update can be repeated or not. If `true`, the same update operation MAY be repeated and MAY succeed; if `false`, repeating the same update operation will fail again. This field MUST NOT be present for errors of other operations. Defaults to true. |
+| retry_after | integer | If an operation failed, this field suggests how long (in seconds) the Platform SHOULD wait until it repeats the operation. If this a negative number, the Platform SHOULD NOT automatically repeat the operation. Defaults to 0 seconds. |


Why we have retry_after as a new field in the body and not using the Retry-After header as per #621?

Maybe the second sentence should be If this is a negative number....

@georgi-lozev the retry_after field if for when the update operation itself has to be retried whereas the Retry-After is to retry the last operation polling.

Yes, I get the idea.

My question is why we won't re-use the same mechanism with the header to retry the update operation, but we introduce a new one, a field that is part of the body.
To me it looks like we can return a Retry-After header as part of the update operation to instruct the platform retrying after a given interval, same way as in last operation polling.

I don't think it's a showstopper, it's more kind of a nitpicking and aiming for some kind of consistency between implementing similar behaviors in the spec.

fmui · 2019-03-26T11:53:14Z

@georgi-lozev I agree. The right approach would be to return these states in the response of the fetch endpoint. But there are brokers that cannot provide a fetch endpoint. The only way I can think of that works for all brokers to transport this information is the error response. Do you have a better idea?

jberkhahn · 2019-04-23T20:50:01Z

This seems reasonable from a k8s perspective. We'd have to plumb this through to our controller, but most of these values already have corresponding values in our reconciliation logic that these could be plugged into

tinygrasshopper · 2019-05-07T12:51:53Z

Thinking a bit more about this, I think this PR does 3 things:

(Feature 1) Adds a mechanism to denote health/usability of a service instance.
(Feature 2) Adds a mechanism to denote if subsequent updates are possible on a service instance
(Feature 3) Adds a mechanism to inform the platform when the subsequent update can be trigged

A comment on Feature 3:
Are we are exposing the complexity of the retrying mechanism at two places in the spec, which I think is unnecessary. I think broker authors can leverage the existing Retry-After header to signify that the operation is still in progress today.

Today retry of an update can be encapsulated in the Last Operation invocation:

Where as after this PR, in addition to the above mechanism broker authors will also have the ability to retry the update itself:

What are the benefits of introducing an alternate mechanism for retry? Can we achieve the same result by adding the health/usability indicator to last operation?

mattmcneeney · 2020-05-13T11:17:01Z

@fmui Do we still need this PR?

fmui and others added 2 commits August 6, 2018 09:24

Merge pull request #1 from openservicebrokerapi/master

c6d21bf

Update

Allow Service Brokers to indicate the state of a Service Instance aft…

1385d2a

…er a failed update or deprovisioning

fmui mentioned this pull request Feb 18, 2019

Clarify the state of an instance after a failed update or delete #570

Closed

fmui and others added 2 commits February 18, 2019 14:25

Merge branch 'master' into instance-state2

edd1ab9

fixed keywords

59341ec

waterlink reviewed Mar 7, 2019

View reviewed changes

waterlink suggested changes Mar 7, 2019

View reviewed changes

fmui added 2 commits March 12, 2019 09:42

Changed 'retry_delay' to 'retry_after'

2bb9bec

Removed duplicate description field.

1a3c917

waterlink approved these changes Mar 12, 2019

View reviewed changes

tinygrasshopper reviewed Mar 12, 2019

View reviewed changes

georgi-lozev reviewed Mar 12, 2019

View reviewed changes

fmui added the do not merge label May 8, 2019

fmui mentioned this pull request May 8, 2019

Indicate if a Service Instance is still usable after a failed update or deprovisioning and if an update can be repeated #661

Merged

2 tasks

mattmcneeney added this to the 2.16 milestone Jun 24, 2019

mattmcneeney modified the milestones: 2.16, 2.17 May 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Service Brokers to indicate the state of a Service Instance after a failed update or deprovisioning #637

Allow Service Brokers to indicate the state of a Service Instance after a failed update or deprovisioning #637

fmui commented Feb 18, 2019

cfdreddbot commented Feb 18, 2019

waterlink Mar 7, 2019

waterlink Mar 7, 2019

fmui Mar 12, 2019

waterlink Mar 12, 2019

waterlink left a comment

waterlink Mar 7, 2019

waterlink left a comment

tinygrasshopper Mar 12, 2019 •

edited

Loading

fmui Mar 12, 2019

georgi-lozev left a comment

georgi-lozev Mar 12, 2019

tinygrasshopper May 7, 2019

georgi-lozev May 7, 2019

fmui commented Mar 26, 2019

jberkhahn commented Apr 23, 2019

tinygrasshopper commented May 7, 2019

mattmcneeney commented May 13, 2020

Allow Service Brokers to indicate the state of a Service Instance after a failed update or deprovisioning #637

Are you sure you want to change the base?

Allow Service Brokers to indicate the state of a Service Instance after a failed update or deprovisioning #637

Conversation

fmui commented Feb 18, 2019

cfdreddbot commented Feb 18, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

waterlink left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

waterlink left a comment

Choose a reason for hiding this comment

tinygrasshopper Mar 12, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

georgi-lozev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fmui commented Mar 26, 2019

jberkhahn commented Apr 23, 2019

tinygrasshopper commented May 7, 2019

mattmcneeney commented May 13, 2020

tinygrasshopper Mar 12, 2019 •

edited

Loading