Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support annotations for versioning behavior #1692

Conversation

antlai-temporal
Copy link
Contributor

@antlai-temporal antlai-temporal commented Oct 28, 2024

What was changed

This PR supports providing a default versioning behavior in Worker Options, and a programmatic API to override these defaults:

For example:

	workflow.SetVersioningBehavior(ctx, workflow.VersioningBehaviorPinned)

called during the first workflow task sets the versioning mode for that workflow to Pinned, eliminating the need of patching.

And

	w := worker.New(c, ts.taskQueueName, worker.Options{
		DeploymentOptions: worker.DeploymentOptions{
                            BuildID:                   "1.0",
		            UseBuildIDForVersioning:   true,
		            DeploymentName:            "deploy-test1",
		            DefaultVersioningBehavior: workflow.VersioningBehaviorAutoUpgrade,
              }
	})

sets the default versioning behavior to AutoUpgrade. Note that DeploymentName, UseBuildIDForVersioning and BuildID need to be set to enable versioning-3.
If versioning-3 is on, but no default or programmatic versioning behavior has been set for a workflow, the server will fail its first workflow task.

Why?

A hint that a workflow is short-running, and therefore, does not need to be moved to the new Build ID on the next task, eliminates the need of patching for many workflow types.

Checklist

  1. Closes
    Add workflow annotations for versioning #1663
  2. How was this tested:
    Integration test that uses versioning-2 and a grpc interceptor to show that versioning behaviors are propagated with responses.
    The dependency on versioning-2 is just to set a default Build ID.

workflow/workflow.go Show resolved Hide resolved
internal/workflow.go Show resolved Hide resolved
test/integration_test.go Outdated Show resolved Hide resolved
internal/workflow.go Outdated Show resolved Hide resolved
@@ -1611,6 +1611,15 @@ func SetCurrentDetails(ctx Context, details string) {
getWorkflowEnvOptions(ctx).currentDetails = details
}

// SetVersioningBehavior sets the strategy to upgrade this workflow when the default Build ID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we intentionally omit a getter here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, versioning behavior can be dependent on worker options so if app code branches based on it they will get an NDE. And in general, we wanted to be able to change annotations without needing patching, so no-getter helps there...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the versioning behaviour written into history?

Copy link
Member

@cretz cretz Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to be able to get something you may have set. I think this should either be in workflow register options and/or this should be called SetVersioningBehaviorOverride (and GetVersioningBehaviorOverride will just return what you have called with the "set", not the resolved value). I assume there are good use cases for setting this do different values throughout the workflow logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Quinn-With-Two-Ns My understanding is that it is not part of the history, at least I do nothing special during replay, @ShahabT can you confirm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the proto changes that @ShahabT did, the versioning behavior has been added to WorkflowTaskCompletedEventAttributes in history. I don't do anything with it (maybe I should?) but that would give the server a placeholder to record the versioning behavior that I send after each task...

Copy link
Member

@cretz cretz Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, so it is in history, which makes sense if workflow author is expected to update it mid-workflow. I do wonder whether the getter should take that into account. I'll defer to y'all on desired behavior of the getter, but I do believe we should not have a setter without a getter (even if we have to add caveats to the behavior of the getter).

Copy link

@carlydf carlydf Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm misunderstanding, versioning behavior is more of a workflow worker level thing not workflow run/execution thing.

@cretz In general, I think of versioning behavior as primarily a workflow type level thing, with a workflow worker level default in case it's not specified at the type level (but from a product perspective, we want to encourage explicitly setting it for each workflow type, so that any developer reading the workflow definition can immediately see what the expected behavior is, and keep that in mind when they make changes to the code that could change it from short-running <-> long-running).

Still, there are some limited use-cases for overriding the versioning behavior at the workflow execution level.

  1. Overriding the versioning behavior of an already-running workflow execution (UpdateWorkflowExecutionProperties):

    • User wants to pin a specific workflow execution that had a weird bug and is still being investigated, but they want to keep the workflow type as a whole unpinned because it is still long-running in general. I think our cloud release validation workflow would be an example of this, or other long-running test pipelines.
    • A lot of other versioning behavior change scenarios are likely to be applied for all workflow executions of the same type, in which case the behavior change could be accomplished by changing the annotation in the workflow code, deploying a new build, then using the batch api to manually change the build id of running workflows so that they use the new workflow code and pick up the new annotation.
    • I still think being able to pin a specific workflow execution without pinning the whole type is useful for debugging. Max liked this idea in my DevX presentation on UpdateWorkflowExecutionProperties scenarios.
    • I think overriding a specific wf execution from pinned --> unpinned is much less important, because the use case is less clear. If you want a specific wf execution to run on the current default build id, you can just change it's build id explicitly. Most use cases for moving to unpinned apply across the wf type.
  2. Overriding the versioning behavior of a workflow execution on workflow start:

    • Customers have requested that we provide a way to start a one-off workflow on a specific build id that is not yet the default build. This is the basic functionality they need to set up deployment canaries easily.
    • It seems like this would involve some parameter such as pinned_build_id_override in StartWorkflowOptions. I think it would be nice if this is symmetric to a concept of pinned_build_id_override in UpdateWorkflowExecutionProperties

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The server stores the current build id and versioning behavior in Mutable State and in visibility. The previously-used build ids are also stored in visibility -- I'm not sure if previously-used build ids are also stores in Mutable State.

"getting" the current versioning behavior of a given workflow execution could be done by reading the mutable state or visibility entry.

Querying a bunch of workflow executions for a specific versioning behavior and/or build id combination would be a visibility query (we will use this for reachability).

It seems to me that to support a workflow-execution-level versioning behavior override, the server would need to store not just the (versioning behavior, build id) tuple, but also an is_override (or some other name) bit, that basically means, if is_override, ignore whatever the sdk says about my workflow type and do task routing based on what is currently written in my mutable state as an override. To me it seems like it could be handled on the server side, since that's where the task routing decision is made.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of versioning behavior as primarily a workflow type level thing, with a workflow worker level default in case it's not specified at the type level (but from a product perspective, we want to encourage explicitly setting it for each workflow type

If this is the case, it also needs to be added to workflow.RegisterOptions.

"getting" the current versioning behavior of a given workflow execution could be done by reading the mutable state or visibility entry.

We don't want to get from server what's not already in history. We just need to make sure any SetWhatever inside a workflow has a corresponding GetWhatever. If this just returns the value that was "set" before in the same workflow code and not the actual server value, that's fine (though it does seem like task complete may have some form of the value). The docs of the getter just need to clarify.

@antlai-temporal
Copy link
Contributor Author

I think the latest commit addresses most of the issues raised, let me know if it is ready to merge...

@cretz
Copy link
Member

cretz commented Oct 30, 2024

@antlai-temporal - based on thread above, I wonder if this needs to be on worker.RegisterOptions too if that is the most common way this should be set (at the workflow type level).

@antlai-temporal
Copy link
Contributor Author

@antlai-temporal - based on thread above, I wonder if this needs to be on worker.RegisterOptions too if that is the most common way this should be set (at the workflow type level).

We could have a type-level default in RegisterOptions that can be overridden by the programmatic API. That would be more consistent with the Java annotation+programmatic override I was thinking on. The Getter will probably return just the programmatic to be consistent that we can change worker defaults without patching.
I'm trying to unblock testing for the server team, so I may do that in a separate PR if that's ok...

// SetVersioningBehavior sets the strategy to upgrade this workflow when the default Build ID
// has changed, and Worker Versioning-3 has been enabled.
//
// SetVersioningBehavior should be called during the first workflow task, and the context `ctx` should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it is called, but not in the first workflow task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was no worker level default, the server will fail the first workflow task, so there is no progress until a worker sets a default.
If tthere was a worker level default, it will change it ok.
In the comment I'm encouraging to set it in the first task to avoid the first case...

@cretz
Copy link
Member

cretz commented Oct 30, 2024

I'm trying to unblock testing for the server team, so I may do that in a separate PR if that's ok...

I think it's probably ok (will defer to @Quinn-With-Two-Ns) but probably want to create an issue to track the work. I do think we should be encouraging that register-options way to set worker-type level value and this should be a lesser-used in-workflow override.

In the comment I'm encouraging to set it in the first task to avoid the first case...

This is why we should be encouraging the workflow register options approach over the inline approach (can even change the name of the inline one to something less wieldy, e.g. SetVersioningBehaviorOverride).

@Quinn-With-Two-Ns
Copy link
Contributor

Yeah OK merging into the branch, +1 to everything Chad said.

Once we are further along I would also like to see documentation as to how this workflow API is supposed to interact with versioning info set externally described here #1692 (comment).

@antlai-temporal
Copy link
Contributor Author

Yeah OK merging into the branch, +1 to everything Chad said.

Once we are further along I would also like to see documentation as to how this workflow API is supposed to interact with versioning info set externally described here #1692 (comment).

@cretz @Quinn-With-Two-Ns
I added the issue #1698 . Merging as it is to unblock server team, but will add another PR before merging to main.
We will clarify the interactions with the external workflow property setting (after discussing today with @carlydf they seem orthogonal). The keyword "override" is used by this external mechanism, I will stick to default vs non-default to avoid confusion. The getter will only return what it was set (as it does right now).

@antlai-temporal antlai-temporal merged commit 36167ae into temporalio:versioning-3 Oct 30, 2024
5 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants