Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Workflow] GRPC connection to workflow runtime doesn't self-heal when app restarts #1293

Open
olitomlinson opened this issue May 22, 2024 · 3 comments
Labels
area/workflow kind/bug Something isn't working P0
Milestone

Comments

@olitomlinson
Copy link

olitomlinson commented May 22, 2024

cc @philliphoff

runtime 1.13.2 (not tried any other versions)

Expected Behavior

The grpc connection to the workflow runtime will reestablish after the app process (not dapr process) crashes and is restarted.

Actual Behavior

The grpc connection to the workflow runtime does not reestablish after the app process (not dapr process) crashes and is restarted.

Steps to Reproduce the Problem

Pull down my repro here https://github.com/olitomlinson/dapr-workflow-examples

  1. run docker compose -f compose-1-instance-3-schedulers.yml build
  2. run docker compose -f compose-1-instance-3-schedulers.yml up
  3. stop the app container in compose - it will be named something like workflow-app-a-1
  4. start the app container in compose
  5. observe the logs in workflow-app-a-1 and you will see the following error repeating forever :

The gRPC server for Durable Task gRPC worker is unavailable. Will continue retrying.

Release Note

RELEASE NOTE:

@cgillum
Copy link
Contributor

cgillum commented Sep 12, 2024

This may have been fixed already in 1.14 as part of pulling in some fixes in durabletask-go. @olitomlinson are you able to verify?

@cgillum cgillum added the P0 label Sep 12, 2024
@cgillum cgillum added this to the v1.15 milestone Sep 12, 2024
@olitomlinson
Copy link
Author

This may have been fixed already in 1.14 as part of pulling in some fixes in durabletask-go. @olitomlinson are you able to verify?

Still an issue in 1.14.4

@famarting
Copy link
Contributor

I find this confusing. For the go-sdk I made the client to infinitely retry the worker connection to dapr, and I think we should have that behavior on every SDK, I believe python already has it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/workflow kind/bug Something isn't working P0
Projects
Status: Backlog
Development

No branches or pull requests

4 participants