-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Honor all non-completion commands #569
Honor all non-completion commands #569
Conversation
61e8a3c
to
ffa761c
Compare
d553b0b
to
9266c22
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but want to wait until temporalio/sdk-core#776 settles and the submodule can be updated to master
.
# If there are successful commands, we must remove all | ||
# non-query-responses after terminal workflow commands. We must do this | ||
# in place to avoid the copy-on-write that occurs when you reassign. | ||
seen_completion = False | ||
i = 0 | ||
while i < len(self._current_completion.successful.commands): | ||
command = self._current_completion.successful.commands[i] | ||
if not seen_completion: | ||
seen_completion = ( | ||
command.HasField("complete_workflow_execution") | ||
or command.HasField("continue_as_new_workflow_execution") | ||
or command.HasField("fail_workflow_execution") | ||
or command.HasField("cancel_workflow_execution") | ||
) | ||
elif not command.HasField("respond_to_query"): | ||
del self._current_completion.successful.commands[i] | ||
continue | ||
i += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any concerns that removing this is backwards incompatible w/ already completed workflows, or can you confirm that in no-flag-replaying situations the core behavior was always the same (sans query stuff)? One thing you can do is make a workflow that has post-complete command, run it in older SDK, grab JSON history, and run replayer in tests here with new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following this bit of the question:
or can you confirm that in no-flag-replaying situations the core behavior was always the same
Here's how I am thinking of it:
-
sdk-python v1.0 was released in Jan 2023, and dropped post-terminal commands from the beginning, until this change.
-
Therefore, prior to this change, all Python WFTs had their post-terminal commands dropped.
-
Incidentally, Core started also dropping post-terminal commands since March 2023: Drop all post-terminal commands & sort activation jobs sdk-core#502
-
The new SDK code drops post-terminal commands when replaying without the flag set, and there is test coverage for this: https://github.com/temporalio/sdk-core/blob/master/core/src/core_tests/workflow_tasks.rs#L2558-L2577. Therefore we do not expect NDEs: the command sequence applied to core state machines when replaying without the flag will be the same as it was prior to this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite following this bit of the question:
Think about a user with an old workflow (i.e. sans flag). If you remove the old Python behavior that runs sans flag, it now relies on the old Core behavior sans flag. If that old behavior doesn't match Python's old behavior, they will get a non-determinism error. So we need to confirm that old Core code does the same thing as old Python code before removing old Python code. Did they drop post-terminal commands the same way? If so, we're all good here.
The new SDK code drops post-terminal commands when replaying without the flag set, and there is test coverage for this
IMO you should grab a workflow history JSON or two from a workflow that had post-terminal commands from a Python SDK before this change, then run it through a replayer in the test on this version. There's a couple of other JSON files in the test suite that you can see how their tests are doing this. Also, I assume the test in this PR is testing that now commands after workflow complete are properly included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think about a user with an old workflow (i.e. sans flag). If you remove the old Python behavior that runs sans flag, it now relies on the old Core behavior sans flag. If that old behavior doesn't match Python's old behavior, they will get a non-determinism error. So we need to confirm that old Core code does the same thing as old Python code before removing old Python code. Did they drop post-terminal commands the same way? If so, we're all good here.
Personally I would substitute s/old Core/new Core/
throughout this paragraph, since we're never going to be running old Core code: rather it's new Core code which, when replaying without the flag, is intended to behave as old Core did (i.e. truncating at first terminal command). This is tested in two different ways in the Core test suite, but I agree that SDK-specific tests replaying old workflows with post-terminal commands would be good too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Makes sense, yeah whatever the terms are that mean "Workflows with post-complete commands on previous Python SDK versions work the exact same with this PR"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the replay backward compatibility test. This should be ready to go.
bc53e49
to
337cae1
Compare
72961fa
to
7a847d5
Compare
The UpdateCompletionAfterWorkflowReturn workflow above features an update handler that returns | ||
after the main workflow coroutine has exited. It will (if an update is sent in the first WFT) | ||
generate a raw command sequence (before sending to core) of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, great comment, makes sense 👍
Fixes #528
With this change we honor all non-completion commands emitted by workflow coroutines, even if they come after a completion command (i.e. complete/CAN/cancel/fail). A consequence is that when an update completion is returned in the same WFT response as a workflow completion, the client will always get the update response; previously that was only the case if the update handler returned prior to any completion command being emitted by another coroutine.
The solution involves devolving responsibilites for this logic to core: see explanatory code comments in temporalio/sdk-core#776.
Evidence that this is correct