Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Honor all non-completion commands #569

Merged

Conversation

dandavison
Copy link
Contributor

@dandavison dandavison commented Jul 1, 2024

Fixes #528

With this change we honor all non-completion commands emitted by workflow coroutines, even if they come after a completion command (i.e. complete/CAN/cancel/fail). A consequence is that when an update completion is returned in the same WFT response as a workflow completion, the client will always get the update response; previously that was only the case if the update handler returned prior to any completion command being emitted by another coroutine.

The solution involves devolving responsibilites for this logic to core: see explanatory code comments in temporalio/sdk-core#776.

Evidence that this is correct

$ pytest -k "completion" tests/worker/test_workflow.py

# These two fail on main
tests/worker/test_workflow.py::test_update_completion_is_honored_when_after_workflow_return_1 PASSED
tests/worker/test_workflow.py::test_update_completion_is_honored_when_after_workflow_return_2 PASSED

# These two check that we respect ordering of completion commands. They pass on main as well as with the new changes.
tests/worker/test_workflow.py::test_first_of_two_signal_completion_commands_is_honored PASSED                                                                                                                
tests/worker/test_workflow.py::test_workflow_return_is_honored_when_it_precedes_signal_completion_command PASSED

@dandavison dandavison force-pushed the sdk-528-let-coroutines-complete-before-setting-completion branch from 61e8a3c to ffa761c Compare July 15, 2024 11:24
@dandavison dandavison changed the title Let workflow coroutines settle before setting completion Honor all non-completion commands Jul 15, 2024
@dandavison dandavison force-pushed the sdk-528-let-coroutines-complete-before-setting-completion branch 2 times, most recently from d553b0b to 9266c22 Compare July 15, 2024 13:03
@dandavison dandavison marked this pull request as ready for review July 15, 2024 22:18
@dandavison dandavison requested a review from a team as a code owner July 15, 2024 22:18
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but want to wait until temporalio/sdk-core#776 settles and the submodule can be updated to master.

Comment on lines -406 to -423
# If there are successful commands, we must remove all
# non-query-responses after terminal workflow commands. We must do this
# in place to avoid the copy-on-write that occurs when you reassign.
seen_completion = False
i = 0
while i < len(self._current_completion.successful.commands):
command = self._current_completion.successful.commands[i]
if not seen_completion:
seen_completion = (
command.HasField("complete_workflow_execution")
or command.HasField("continue_as_new_workflow_execution")
or command.HasField("fail_workflow_execution")
or command.HasField("cancel_workflow_execution")
)
elif not command.HasField("respond_to_query"):
del self._current_completion.successful.commands[i]
continue
i += 1
Copy link
Member

@cretz cretz Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concerns that removing this is backwards incompatible w/ already completed workflows, or can you confirm that in no-flag-replaying situations the core behavior was always the same (sans query stuff)? One thing you can do is make a workflow that has post-complete command, run it in older SDK, grab JSON history, and run replayer in tests here with new code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following this bit of the question:

or can you confirm that in no-flag-replaying situations the core behavior was always the same

Here's how I am thinking of it:

Copy link
Member

@cretz cretz Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following this bit of the question:

Think about a user with an old workflow (i.e. sans flag). If you remove the old Python behavior that runs sans flag, it now relies on the old Core behavior sans flag. If that old behavior doesn't match Python's old behavior, they will get a non-determinism error. So we need to confirm that old Core code does the same thing as old Python code before removing old Python code. Did they drop post-terminal commands the same way? If so, we're all good here.

The new SDK code drops post-terminal commands when replaying without the flag set, and there is test coverage for this

IMO you should grab a workflow history JSON or two from a workflow that had post-terminal commands from a Python SDK before this change, then run it through a replayer in the test on this version. There's a couple of other JSON files in the test suite that you can see how their tests are doing this. Also, I assume the test in this PR is testing that now commands after workflow complete are properly included?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think about a user with an old workflow (i.e. sans flag). If you remove the old Python behavior that runs sans flag, it now relies on the old Core behavior sans flag. If that old behavior doesn't match Python's old behavior, they will get a non-determinism error. So we need to confirm that old Core code does the same thing as old Python code before removing old Python code. Did they drop post-terminal commands the same way? If so, we're all good here.

Personally I would substitute s/old Core/new Core/ throughout this paragraph, since we're never going to be running old Core code: rather it's new Core code which, when replaying without the flag, is intended to behave as old Core did (i.e. truncating at first terminal command). This is tested in two different ways in the Core test suite, but I agree that SDK-specific tests replaying old workflows with post-terminal commands would be good too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Makes sense, yeah whatever the terms are that mean "Workflows with post-complete commands on previous Python SDK versions work the exact same with this PR"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the replay backward compatibility test. This should be ready to go.

@dandavison dandavison force-pushed the sdk-528-let-coroutines-complete-before-setting-completion branch 6 times, most recently from bc53e49 to 337cae1 Compare July 23, 2024 20:17
@dandavison dandavison force-pushed the sdk-528-let-coroutines-complete-before-setting-completion branch 4 times, most recently from 72961fa to 7a847d5 Compare August 3, 2024 01:30
Comment on lines +333 to +335
The UpdateCompletionAfterWorkflowReturn workflow above features an update handler that returns
after the main workflow coroutine has exited. It will (if an update is sent in the first WFT)
generate a raw command sequence (before sending to core) of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, great comment, makes sense 👍

@dandavison dandavison merged commit 50914c4 into main Aug 5, 2024
12 checks passed
@dandavison dandavison deleted the sdk-528-let-coroutines-complete-before-setting-completion branch August 5, 2024 18:19
dandavison added a commit to dandavison/temporalio-samples-python that referenced this pull request Aug 5, 2024
dandavison added a commit to temporalio/samples-python that referenced this pull request Aug 12, 2024
dandavison added a commit to dandavison/temporalio-samples-python that referenced this pull request Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Do not set workflow completion until after all coroutines have settled in the task
3 participants