Add saga pattern support in Dapr workflow #47

skyao · 2023-12-04T03:43:52Z

Related issue is #48

Signed-off-by: Sky Ao <[email protected]>

berndverst · 2023-12-05T04:34:30Z

While this proposal sounds useful I would like to see the core implementation / common interface, API methods outlined in pseudocode.

It's very hard to interpret Java code and think through how this would work in Python for example.

Could you update the proposal to identify the generic structure that all SDKs must implement?

In this proposal we can then discuss whether this will make sense. After well we want to have a unified implementation across all SDKs, so we will want to agree on the same implementation strategy in this proposal.

artursouza

LGTM. Please, get approval from other SDK maintainers. It is not a blocker for Java SDK implementation, but it might change based on how this proposal evolves - which is OK since the workflow SDK is version 0.x.

artursouza · 2023-12-05T07:44:45Z

While this proposal sounds useful I would like to see the core implementation / common interface, API methods outlined in pseudocode.

It's very hard to interpret Java code and think through how this would work in Python for example.

Could you update the proposal to identify the generic structure that all SDKs must implement?

In this proposal we can then discuss whether this will make sense. After well we want to have a unified implementation across all SDKs, so we will want to agree on the same implementation strategy in this proposal.

I agree. I think the implementation detail will evolve as the Java implementation serves as a reference. I am not blocking the Java implementation waiting for this proposal to be merged but knowing the implementation can change (ok for version 0.x of the workflow SDK).

skyao · 2023-12-05T07:59:38Z

While this proposal sounds useful I would like to see the core implementation / common interface, API methods outlined in pseudocode.

It's very hard to interpret Java code and think through how this would work in Python for example.

Could you update the proposal to identify the generic structure that all SDKs must implement?

In this proposal we can then discuss whether this will make sense. After well we want to have a unified implementation across all SDKs, so we will want to agree on the same implementation strategy in this proposal.

Good suggestion, I will update it soon.

halspang

Main comment is about the registering of the activity and its compensation separately, I'd prefer to not do that if we can avoid it.

I am a bit worried about having to reimplement this across different SDKs though, specifically around how we do error handling/compensation triggering. If we can register the activity together with its compensation, is it possible to have the runtime trigger it? I think that makes the code cleaning and leaves less implementation details in the SDK.

Chatted with Chris a little bit and it seems like the above isn't really something the runtime is aware of atm in terms of which activity is executing. I'm just curious if we've thought about it/how much work it may be and if it's worth it. In an ideal world I'd love to see the runtime be able to walk back the entire workflow itself but I may be off in terms of what it's capable of right now.

0011-S-Saga-pattern-support-in-dapr-workflow.md

kaibocai · 2023-12-08T17:23:33Z

Thanks @skyao for this great proposal. I agree with @halspang's comments. Besides that, LGTM!
Just one more comment, in the proposed implementation, if the user enables Saga, we handle the compensation logic for them in the catch block for the SDK, what if the user has some specific logic when an exception happens, in other words, they need to do some work (mostly logging) before trigger compensation. It seems in the current proposal we don't provide a chance for them to do this. They can do it in their catch block but that is just opposed to the proposed solution.

mukundansundar · 2023-12-09T12:20:05Z

Overall the proposal LGTM.

I agree with both @halspang's and @kaibocai's comments.
Providing an interface which can be implemented by an activity for adding the compensation feature, would be great.
Also moving the logic of calling compensation function into the workflow runtime will make the business logic code more clear.

Should we add another function like compensateOn(Predicate<T Result> result), where if an exception is not thrown, and the compensation must be triggered due to a null Result or some value in result, the Predicate function can be used?

Additionally, should a hook be provided so that users can look into the error/exception if thrown before the compensation is triggered i.e. onError() is called before compensate() is called?

olitomlinson · 2023-12-11T22:06:28Z

I would love to see first-class support for Sagas in Workflows across all languages.

FWIW I had the same idea a good few months ago (which was based on the same idea I had of using Sagas for Azure Durable Functions 2 years ago) so it's nice to see that you've arrived in a place along the same line as where I was thinking!

shubham1172 · 2023-12-13T03:01:20Z

The proposal LGTM overall, thanks @skyao. Agree with the comments here, specifically (1) not having to register a compensation separately, and (2) if we can offload certain things to runtime, to avoid having to duplicate this to all SDKs and maintain them.

skyao · 2023-12-14T02:05:36Z

I would love to see first-class support for Sagas in Workflows across all languages.

We'll start with java, and hopefully this proposal will be accepted soon so that the implementation of saga pattern in the dapr java sdk can be released in the dapr v1.13. Then I plan to add python and .net support in next dapr release v1.14.

Chris has agreed to do some optimization work in workflow for saga support, as you see, "first-class support".

This should all happen soon.

cgillum · 2023-12-15T04:15:18Z

Sharing my thoughts on the feedback received so far:

Main comment is about the registering of the activity and its compensation separately, I'd prefer to not do that if we can avoid it.

@halspang (and others) if I understand the concern correctly, this is about reducing boilerplate. While I sympathize with that, I worry about how this can make the saga pattern overly opinionated and less useful. For example, I don't think there will always be a 1:1 relationship between an activity and its compensation. Furthermore, the exact compensation strategy (which activity to call) and parameters may also need to be different for the same activity executed at different points in the workflow. I would prefer that we start by erring on the side of a loosely coupled design.

In an ideal world I'd love to see the runtime be able to walk back the entire workflow itself

I understand the appeal of this idea but have two major concerns:

This would require non-trivial changes in the workflow runtime and I'm not convinced that the benefit justifies the cost at this point, especially given that this is a new feature that hasn't been developer tested. Though redundant, an SDK implementation will be cheaper to build, maintain, and change in the future based on user feedback.
This could necessarily make the compensation model even more inflexible because the runtime will have to make specific assumptions about when and how to invoke compensation logic.

I'll also point out that the activity registration verbosity problem can be solved in other ways, such as what @mukundansundar has proposed in the Python SDK. In other words, we may be able to treat these concerns separately.

My overarching hope is that the saga implementation can be loosely coupled to the overall SDK logic, especially given that it's a brand-new thing that we want to get more real-world feedback on.

cgillum · 2023-12-15T04:20:37Z

if the user enables Saga, we handle the compensation logic for them in the catch block for the SDK, what if the user has some specific logic when an exception happens, in other words, they need to do some work (mostly logging) before trigger compensation. It seems in the current proposal we don't provide a chance for them to do this. They can do it in their catch block but that is just opposed to the proposed solution.

@kaibocai I agree that the ability to have some custom logic is important, which is partly why I strongly hesitate against tightly coupling an activity's execution to its compensation. In the proposed model, which is loosely coupled, developers can execute custom logic (e.g., logging) by catching exceptions, logging in the catch block, and then rethrowing the exception (or use manual compensation) to trigger the compensation. I don't think this approach is opposed to the proposed solution. I think the auto-compensation done by the SDK should be seen as a convenience feature.

cgillum · 2023-12-15T04:27:40Z

Additionally, should a hook be provided so that users can look into the error/exception if thrown before the compensation is triggered i.e. onError() is called before compensate() is called?

@mukundansundar similar to my previous responses, this isn't necessary with the current proposal because you can control if/when the compensation gets triggered. These hooks that you're suggesting are only required if we tightly couple an activity invocation with its compensation logic, which I'm arguing we shouldn't do because it creates inflexibility (and requires us to complicate the design with behavior customizing hooks, etc.).

olitomlinson · 2023-12-15T11:33:48Z

Having thought this through, I agree with @cgillum comments.

Tightly coupling the compensation action to the Activity is very opinionated.

Let's assume that given a customers use-case, the opinionated model worked fine for a while, but then the user wished to change the Workflow to do something different that doesn't follow the opinionated model, how would they 'break-out' and write a custom compensation just for that one Activity?

skyao · 2023-12-18T12:28:58Z

@halspang I'm so sorry to delete a comment by mistaken.

This deleted note and the previous one said the same thing, and I replied to them together.

Signed-off-by: Sky Ao <[email protected]>

DeepanshuA · 2023-12-21T10:30:15Z

Went through the proposal today. @skyao I think it's a great proposal, that would immensely benefit users.

I too agree with comments regarding keeping registration of compensation separate than activity registration.

@skyao I also wanted to understand a bit more on lines of how compensation will work in different scenarios i.e. An Activity A's compensation may need to be called, if workflow fails at an activity X's level, but Activity A's compensation may NOT be needed to be called if workflow fails at Activity Y's level.
Will there be some rules/ filters that need to be designed accordingly, as a part of infra here OR will user need to explicitly mention in code, depending on language specific error/exception semantics?

skyao · 2024-01-02T08:21:37Z

@skyao I also wanted to understand a bit more on lines of how compensation will work in different scenarios i.e. An Activity A's compensation may need to be called, if workflow fails at an activity X's level, but Activity A's compensation may NOT be needed to be called if workflow fails at Activity Y's level. Will there be some rules/ filters that need to be designed accordingly, as a part of infra here OR will user need to explicitly mention in code, depending on language specific error/exception semantics?

Currently we have not considered such a complex compensation logic; this judgment of whether to compensate is made across activities, and it requires some global data across activities. Of course, if the activiy x/y failed with output, it can be such a simple judgment:

Object output-x = ctx.callActivity("activity-x");
Object output-y = ctx.callActivity("activity-y");

......
Object output-a = ctx.callActivity("activity-a");
if (!output-x.isOK() && output-y.isOK) {
   ctx.registerCompensation("compensation-b")
}

But if the activiy x/y failed with exceptions, users have to do try/catch to let the workflow continue to execute activity-a when activity-x and activity-y are failed:

boolean isXFailed = true;
boolean isYFailed = true;
try {
    Object output-x = ctx.callActivity("activity-x");
    isXFailed = false;
} catch {...}
try{
    Object output-y = ctx.callActivity("activity-y");
    isYFailed = false;
} catch {...}

......
Object output-a = ctx.callActivity("activity-a");
if (isXFailed && !isYFailed) {
   ctx.registerCompensation("compensation-b")
}

I have to say that the flexibility of current proposal is very high, and users can always combine compensation strategies that meet their requirements.

From this point of view, I agree with Chris' suggestion: flexibility is more important.

skyao · 2024-01-02T08:28:58Z

While this proposal sounds useful I would like to see the core implementation / common interface, API methods outlined in pseudocode.

It's very hard to interpret Java code and think through how this would work in Python for example.

Could you update the proposal to identify the generic structure that all SDKs must implement?

In this proposal we can then discuss whether this will make sense. After well we want to have a unified implementation across all SDKs, so we will want to agree on the same implementation strategy in this proposal.

I think it can be done without another set of pseudo-code. After the java-sdk implementations are merged, I will contact the maintainers of the python sdk and .net sdk to come together and implement saga mode in the python sdk and .net sdk. I will then help the maintainers define the python and .net APIs and implementations directly and update them to this proposal.

I expect saga support for python-sdk and .net sdk to be available in dapr v1.14, and I promise I'll support it.

artursouza · 2024-01-05T18:47:35Z

I agree with @cgillum to keep the activity and compensation loosely coupled to begin with. We can always add opinionated Facades on top if needed.

+1 binding

skyao · 2024-03-17T15:51:13Z

@artursouza Can we merge this PR now to add this proposal into main branch.

joebowbeer · 2024-06-26T10:34:38Z

Would it be helpful to add references to Saga support in other workflow tools?

In Temporal, for example, there is a helper class in Java but in other languages there is only sample code:

https://temporal.io/blog/saga-pattern-made-easy

In Java, the Saga class keeps track of compensations for you ... In other language SDKs you can easily write the addCompensation and compensate functions yourself.

(Followed by links to examples for Go, PHP, Python, and TypeScript, in addition to Java.)

In Golang there is no try/catch hell and defer is built into the language.

Even in Java, consider whether this should be part of the SDK or whether it should be part of a contrib library or even a DSL built on top of (one of) the Dapr SDKs.

Questions:

In a workflow-as-code solution such as Dapr, why can't different custom solutions and patterns be accomplished "in code" using the features of the programming language and its libraries?

Is Saga such a singularly useful pattern that it should to be built into every Dapr SDK?

skyao added the enhancement New feature or request label Dec 4, 2023

This was referenced Dec 4, 2023

[Proposal]Saga pattern support in Dapr workflow dapr/java-sdk#968

Closed

Add saga pattern support in Dapr workflow #48

Open

add saga support proposal

1ddf755

Signed-off-by: Sky Ao <[email protected]>

artursouza previously approved these changes Dec 5, 2023

View reviewed changes

halspang reviewed Dec 7, 2023

View reviewed changes

0011-S-Saga-pattern-support-in-dapr-workflow.md Outdated Show resolved Hide resolved

0011-S-Saga-pattern-support-in-dapr-workflow.md Show resolved Hide resolved

dapr deleted a comment from halspang Dec 18, 2023

only WorkflowActivity, no CompensatableWorkflowActivity

0dc1157

Signed-off-by: Sky Ao <[email protected]>

skyao dismissed artursouza’s stale review via 0dc1157 December 19, 2023 13:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add saga pattern support in Dapr workflow #47

Add saga pattern support in Dapr workflow #47

skyao commented Dec 4, 2023 •

edited

Loading

berndverst commented Dec 5, 2023

artursouza left a comment

artursouza commented Dec 5, 2023

skyao commented Dec 5, 2023

halspang left a comment

kaibocai commented Dec 8, 2023

mukundansundar commented Dec 9, 2023 •

edited

Loading

olitomlinson commented Dec 11, 2023

shubham1172 commented Dec 13, 2023

skyao commented Dec 14, 2023 •

edited

Loading

cgillum commented Dec 15, 2023

cgillum commented Dec 15, 2023

cgillum commented Dec 15, 2023

olitomlinson commented Dec 15, 2023

skyao commented Dec 18, 2023

DeepanshuA commented Dec 21, 2023 •

edited

Loading

skyao commented Jan 2, 2024

skyao commented Jan 2, 2024 •

edited

Loading

artursouza commented Jan 5, 2024

skyao commented Mar 17, 2024

joebowbeer commented Jun 26, 2024 •

edited

Loading

Add saga pattern support in Dapr workflow #47

Are you sure you want to change the base?

Add saga pattern support in Dapr workflow #47

Conversation

skyao commented Dec 4, 2023 • edited Loading

berndverst commented Dec 5, 2023

artursouza left a comment

Choose a reason for hiding this comment

artursouza commented Dec 5, 2023

skyao commented Dec 5, 2023

halspang left a comment

Choose a reason for hiding this comment

kaibocai commented Dec 8, 2023

mukundansundar commented Dec 9, 2023 • edited Loading

olitomlinson commented Dec 11, 2023

shubham1172 commented Dec 13, 2023

skyao commented Dec 14, 2023 • edited Loading

cgillum commented Dec 15, 2023

cgillum commented Dec 15, 2023

cgillum commented Dec 15, 2023

olitomlinson commented Dec 15, 2023

skyao commented Dec 18, 2023

DeepanshuA commented Dec 21, 2023 • edited Loading

skyao commented Jan 2, 2024

skyao commented Jan 2, 2024 • edited Loading

artursouza commented Jan 5, 2024

skyao commented Mar 17, 2024

joebowbeer commented Jun 26, 2024 • edited Loading

skyao commented Dec 4, 2023 •

edited

Loading

mukundansundar commented Dec 9, 2023 •

edited

Loading

skyao commented Dec 14, 2023 •

edited

Loading

DeepanshuA commented Dec 21, 2023 •

edited

Loading

skyao commented Jan 2, 2024 •

edited

Loading

joebowbeer commented Jun 26, 2024 •

edited

Loading