Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [TaskExecutionRunnable] Sub workflow task always repeat run in master-server failover #16767

Open
3 tasks done
reele opened this issue Nov 5, 2024 · 4 comments
Open
3 tasks done
Labels
question Further information is requested

Comments

@reele
Copy link
Contributor

reele commented Nov 5, 2024

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

Normally, FailoverCoordinator.getFailoverWorkflowsForMaster() finds all workflows that need failover. When the sub-workflow task's TaskExecutionRunnable.failover() method is called, takeOverTaskFromExecutor() now returns false if the task is a logic task. This results in the creation of a new sub-workflow task instance and the publication of its TaskStartLifecycleEvent, causing the sub-workflow to run again during the failover process.

@Override
public void failover() {
checkState(isTaskInstanceInitialized(), "The task instance is not initialized, can't failover.");
if (takeOverTaskFromExecutor()) {
log.info("Failover task success, the task {} has been taken-over from executor", taskInstance.getName());
return;
}
this.taskInstance = applicationContext.getBean(TaskInstanceFactories.class)
.failoverTaskInstanceFactory()
.builder()
.withTaskInstance(taskInstance)
.build();
initializeTaskExecutionContext();
getWorkflowEventBus().publish(TaskStartLifecycleEvent.of(this));
}

private boolean takeOverTaskFromExecutor() {
checkState(isTaskInstanceInitialized(), "The task instance is null, can't take over from executor.");
if (TaskTypeUtils.isLogicTask(taskInstance.getTaskType())) {
return false;
}

so i think it would be better to check the sub-workflow instance properly(by dao or server communicate) and take it over, instead of creating a whole new task instance.

What you expected to happen

Take over sub-workflow task if the sub-workflow instance is in good status.

How to reproduce

execute a workflow with sub-workflow, restart the master-server, query the database

Anything else

No response

Version

dev

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@reele reele added bug Something isn't working Waiting for reply Waiting for reply labels Nov 5, 2024
@SbloodyS
Copy link
Member

SbloodyS commented Nov 5, 2024

The sub-workflow task is a special task type. During the fault tolerance period, all task types will generate new task instances, which is a unified operation. So this is not a bug.

@SbloodyS SbloodyS added question Further information is requested and removed bug Something isn't working Waiting for reply Waiting for reply labels Nov 5, 2024
@reele
Copy link
Contributor Author

reele commented Nov 5, 2024

The sub-workflow task is a special task type. During the fault tolerance period, all task types will generate new task instances, which is a unified operation. So this is not a bug.

agree that too, perhaps we can make some improvement to avoid task repeating execute in future.

@SbloodyS
Copy link
Member

SbloodyS commented Nov 5, 2024

agree that too, perhaps we can make some improvement to avoid task repeating execute in future.

Task instances usually run as Linux processes, and the same process id may be occupied by other programs after downtime. So it's difficult to achieve.

@reele
Copy link
Contributor Author

reele commented Nov 5, 2024

agree that too, perhaps we can make some improvement to avoid task repeating execute in future.

Task instances usually run as Linux processes, and the same process id may be occupied by other programs after downtime. So it's difficult to achieve.

yes, but i mean, the sub-workflow is already be takeover in master's failover process, it just invisible and still running in background, then the father-workflow in failover process, will rerun's the sub-workflow again(by the sub-workflow task), maybe the original workflow and the new workflow are both running in time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants