Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add saga pattern support in Dapr workflow #47

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

skyao
Copy link
Member

@skyao skyao commented Dec 4, 2023

Related issue is #48

@berndverst
Copy link
Member

While this proposal sounds useful I would like to see the core implementation / common interface, API methods outlined in pseudocode.

It's very hard to interpret Java code and think through how this would work in Python for example.

Could you update the proposal to identify the generic structure that all SDKs must implement?

In this proposal we can then discuss whether this will make sense. After well we want to have a unified implementation across all SDKs, so we will want to agree on the same implementation strategy in this proposal.

artursouza
artursouza previously approved these changes Dec 5, 2023
Copy link
Member

@artursouza artursouza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please, get approval from other SDK maintainers. It is not a blocker for Java SDK implementation, but it might change based on how this proposal evolves - which is OK since the workflow SDK is version 0.x.

@artursouza
Copy link
Member

While this proposal sounds useful I would like to see the core implementation / common interface, API methods outlined in pseudocode.

It's very hard to interpret Java code and think through how this would work in Python for example.

Could you update the proposal to identify the generic structure that all SDKs must implement?

In this proposal we can then discuss whether this will make sense. After well we want to have a unified implementation across all SDKs, so we will want to agree on the same implementation strategy in this proposal.

I agree. I think the implementation detail will evolve as the Java implementation serves as a reference. I am not blocking the Java implementation waiting for this proposal to be merged but knowing the implementation can change (ok for version 0.x of the workflow SDK).

@skyao
Copy link
Member Author

skyao commented Dec 5, 2023

While this proposal sounds useful I would like to see the core implementation / common interface, API methods outlined in pseudocode.

It's very hard to interpret Java code and think through how this would work in Python for example.

Could you update the proposal to identify the generic structure that all SDKs must implement?

In this proposal we can then discuss whether this will make sense. After well we want to have a unified implementation across all SDKs, so we will want to agree on the same implementation strategy in this proposal.

Good suggestion, I will update it soon.

Copy link

@halspang halspang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main comment is about the registering of the activity and its compensation separately, I'd prefer to not do that if we can avoid it.

I am a bit worried about having to reimplement this across different SDKs though, specifically around how we do error handling/compensation triggering. If we can register the activity together with its compensation, is it possible to have the runtime trigger it? I think that makes the code cleaning and leaves less implementation details in the SDK.

Chatted with Chris a little bit and it seems like the above isn't really something the runtime is aware of atm in terms of which activity is executing. I'm just curious if we've thought about it/how much work it may be and if it's worth it. In an ideal world I'd love to see the runtime be able to walk back the entire workflow itself but I may be off in terms of what it's capable of right now.

0011-S-Saga-pattern-support-in-dapr-workflow.md Outdated Show resolved Hide resolved
@kaibocai
Copy link

kaibocai commented Dec 8, 2023

Thanks @skyao for this great proposal. I agree with @halspang's comments. Besides that, LGTM!
Just one more comment, in the proposed implementation, if the user enables Saga, we handle the compensation logic for them in the catch block for the SDK, what if the user has some specific logic when an exception happens, in other words, they need to do some work (mostly logging) before trigger compensation. It seems in the current proposal we don't provide a chance for them to do this. They can do it in their catch block but that is just opposed to the proposed solution.

@mukundansundar
Copy link
Contributor

mukundansundar commented Dec 9, 2023

Overall the proposal LGTM.

I agree with both @halspang's and @kaibocai's comments.
Providing an interface which can be implemented by an activity for adding the compensation feature, would be great.
Also moving the logic of calling compensation function into the workflow runtime will make the business logic code more clear.

Should we add another function like compensateOn(Predicate<T Result> result), where if an exception is not thrown, and the compensation must be triggered due to a null Result or some value in result, the Predicate function can be used?

Additionally, should a hook be provided so that users can look into the error/exception if thrown before the compensation is triggered i.e. onError() is called before compensate() is called?

@olitomlinson
Copy link

I would love to see first-class support for Sagas in Workflows across all languages.

FWIW I had the same idea a good few months ago (which was based on the same idea I had of using Sagas for Azure Durable Functions 2 years ago) so it's nice to see that you've arrived in a place along the same line as where I was thinking!

@shubham1172
Copy link
Member

The proposal LGTM overall, thanks @skyao. Agree with the comments here, specifically (1) not having to register a compensation separately, and (2) if we can offload certain things to runtime, to avoid having to duplicate this to all SDKs and maintain them.

@skyao
Copy link
Member Author

skyao commented Dec 14, 2023

I would love to see first-class support for Sagas in Workflows across all languages.

We'll start with java, and hopefully this proposal will be accepted soon so that the implementation of saga pattern in the dapr java sdk can be released in the dapr v1.13. Then I plan to add python and .net support in next dapr release v1.14.

Chris has agreed to do some optimization work in workflow for saga support, as you see, "first-class support".

This should all happen soon.

@cgillum
Copy link

cgillum commented Dec 15, 2023

Sharing my thoughts on the feedback received so far:

Main comment is about the registering of the activity and its compensation separately, I'd prefer to not do that if we can avoid it.

@halspang (and others) if I understand the concern correctly, this is about reducing boilerplate. While I sympathize with that, I worry about how this can make the saga pattern overly opinionated and less useful. For example, I don't think there will always be a 1:1 relationship between an activity and its compensation. Furthermore, the exact compensation strategy (which activity to call) and parameters may also need to be different for the same activity executed at different points in the workflow. I would prefer that we start by erring on the side of a loosely coupled design.

In an ideal world I'd love to see the runtime be able to walk back the entire workflow itself

I understand the appeal of this idea but have two major concerns:

  1. This would require non-trivial changes in the workflow runtime and I'm not convinced that the benefit justifies the cost at this point, especially given that this is a new feature that hasn't been developer tested. Though redundant, an SDK implementation will be cheaper to build, maintain, and change in the future based on user feedback.

  2. This could necessarily make the compensation model even more inflexible because the runtime will have to make specific assumptions about when and how to invoke compensation logic.

I'll also point out that the activity registration verbosity problem can be solved in other ways, such as what @mukundansundar has proposed in the Python SDK. In other words, we may be able to treat these concerns separately.

My overarching hope is that the saga implementation can be loosely coupled to the overall SDK logic, especially given that it's a brand-new thing that we want to get more real-world feedback on.

@cgillum
Copy link

cgillum commented Dec 15, 2023

if the user enables Saga, we handle the compensation logic for them in the catch block for the SDK, what if the user has some specific logic when an exception happens, in other words, they need to do some work (mostly logging) before trigger compensation. It seems in the current proposal we don't provide a chance for them to do this. They can do it in their catch block but that is just opposed to the proposed solution.

@kaibocai I agree that the ability to have some custom logic is important, which is partly why I strongly hesitate against tightly coupling an activity's execution to its compensation. In the proposed model, which is loosely coupled, developers can execute custom logic (e.g., logging) by catching exceptions, logging in the catch block, and then rethrowing the exception (or use manual compensation) to trigger the compensation. I don't think this approach is opposed to the proposed solution. I think the auto-compensation done by the SDK should be seen as a convenience feature.

@cgillum
Copy link

cgillum commented Dec 15, 2023

Additionally, should a hook be provided so that users can look into the error/exception if thrown before the compensation is triggered i.e. onError() is called before compensate() is called?

@mukundansundar similar to my previous responses, this isn't necessary with the current proposal because you can control if/when the compensation gets triggered. These hooks that you're suggesting are only required if we tightly couple an activity invocation with its compensation logic, which I'm arguing we shouldn't do because it creates inflexibility (and requires us to complicate the design with behavior customizing hooks, etc.).

@olitomlinson
Copy link

Having thought this through, I agree with @cgillum comments.

Tightly coupling the compensation action to the Activity is very opinionated.

Let's assume that given a customers use-case, the opinionated model worked fine for a while, but then the user wished to change the Workflow to do something different that doesn't follow the opinionated model, how would they 'break-out' and write a custom compensation just for that one Activity?

@dapr dapr deleted a comment from halspang Dec 18, 2023
@skyao
Copy link
Member Author

skyao commented Dec 18, 2023

@halspang I'm so sorry to delete a comment by mistaken.

This deleted note and the previous one said the same thing, and I replied to them together.

@DeepanshuA
Copy link

DeepanshuA commented Dec 21, 2023

Went through the proposal today. @skyao I think it's a great proposal, that would immensely benefit users.

I too agree with comments regarding keeping registration of compensation separate than activity registration.

@skyao I also wanted to understand a bit more on lines of how compensation will work in different scenarios i.e. An Activity A's compensation may need to be called, if workflow fails at an activity X's level, but Activity A's compensation may NOT be needed to be called if workflow fails at Activity Y's level.
Will there be some rules/ filters that need to be designed accordingly, as a part of infra here OR will user need to explicitly mention in code, depending on language specific error/exception semantics?

@skyao
Copy link
Member Author

skyao commented Jan 2, 2024

@skyao I also wanted to understand a bit more on lines of how compensation will work in different scenarios i.e. An Activity A's compensation may need to be called, if workflow fails at an activity X's level, but Activity A's compensation may NOT be needed to be called if workflow fails at Activity Y's level. Will there be some rules/ filters that need to be designed accordingly, as a part of infra here OR will user need to explicitly mention in code, depending on language specific error/exception semantics?

Currently we have not considered such a complex compensation logic; this judgment of whether to compensate is made across activities, and it requires some global data across activities. Of course, if the activiy x/y failed with output, it can be such a simple judgment:

Object output-x = ctx.callActivity("activity-x");
Object output-y = ctx.callActivity("activity-y");

......
Object output-a = ctx.callActivity("activity-a");
if (!output-x.isOK() && output-y.isOK) {
   ctx.registerCompensation("compensation-b")
}

But if the activiy x/y failed with exceptions, users have to do try/catch to let the workflow continue to execute activity-a when activity-x and activity-y are failed:

boolean isXFailed = true;
boolean isYFailed = true;
try {
    Object output-x = ctx.callActivity("activity-x");
    isXFailed = false;
} catch {...}
try{
    Object output-y = ctx.callActivity("activity-y");
    isYFailed = false;
} catch {...}

......
Object output-a = ctx.callActivity("activity-a");
if (isXFailed && !isYFailed) {
   ctx.registerCompensation("compensation-b")
}

I have to say that the flexibility of current proposal is very high, and users can always combine compensation strategies that meet their requirements.

From this point of view, I agree with Chris' suggestion: flexibility is more important.

@skyao
Copy link
Member Author

skyao commented Jan 2, 2024

While this proposal sounds useful I would like to see the core implementation / common interface, API methods outlined in pseudocode.

It's very hard to interpret Java code and think through how this would work in Python for example.

Could you update the proposal to identify the generic structure that all SDKs must implement?

In this proposal we can then discuss whether this will make sense. After well we want to have a unified implementation across all SDKs, so we will want to agree on the same implementation strategy in this proposal.

I think it can be done without another set of pseudo-code. After the java-sdk implementations are merged, I will contact the maintainers of the python sdk and .net sdk to come together and implement saga mode in the python sdk and .net sdk. I will then help the maintainers define the python and .net APIs and implementations directly and update them to this proposal.

I expect saga support for python-sdk and .net sdk to be available in dapr v1.14, and I promise I'll support it.

@artursouza
Copy link
Member

I agree with @cgillum to keep the activity and compensation loosely coupled to begin with. We can always add opinionated Facades on top if needed.

+1 binding

@skyao
Copy link
Member Author

skyao commented Mar 17, 2024

@artursouza Can we merge this PR now to add this proposal into main branch.

@joebowbeer
Copy link

joebowbeer commented Jun 26, 2024

Would it be helpful to add references to Saga support in other workflow tools?

In Temporal, for example, there is a helper class in Java but in other languages there is only sample code:

https://temporal.io/blog/saga-pattern-made-easy

In Java, the Saga class keeps track of compensations for you ... In other language SDKs you can easily write the addCompensation and compensate functions yourself.

(Followed by links to examples for Go, PHP, Python, and TypeScript, in addition to Java.)

In Golang there is no try/catch hell and defer is built into the language.

Even in Java, consider whether this should be part of the SDK or whether it should be part of a contrib library or even a DSL built on top of (one of) the Dapr SDKs.

Questions:

In a workflow-as-code solution such as Dapr, why can't different custom solutions and patterns be accomplished "in code" using the features of the programming language and its libraries?

Is Saga such a singularly useful pattern that it should to be built into every Dapr SDK?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.