-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add saga pattern support in Dapr workflow #47
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sky Ao <[email protected]>
While this proposal sounds useful I would like to see the core implementation / common interface, API methods outlined in pseudocode. It's very hard to interpret Java code and think through how this would work in Python for example. Could you update the proposal to identify the generic structure that all SDKs must implement? In this proposal we can then discuss whether this will make sense. After well we want to have a unified implementation across all SDKs, so we will want to agree on the same implementation strategy in this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Please, get approval from other SDK maintainers. It is not a blocker for Java SDK implementation, but it might change based on how this proposal evolves - which is OK since the workflow SDK is version 0.x.
I agree. I think the implementation detail will evolve as the Java implementation serves as a reference. I am not blocking the Java implementation waiting for this proposal to be merged but knowing the implementation can change (ok for version 0.x of the workflow SDK). |
Good suggestion, I will update it soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main comment is about the registering of the activity and its compensation separately, I'd prefer to not do that if we can avoid it.
I am a bit worried about having to reimplement this across different SDKs though, specifically around how we do error handling/compensation triggering. If we can register the activity together with its compensation, is it possible to have the runtime trigger it? I think that makes the code cleaning and leaves less implementation details in the SDK.
Chatted with Chris a little bit and it seems like the above isn't really something the runtime is aware of atm in terms of which activity is executing. I'm just curious if we've thought about it/how much work it may be and if it's worth it. In an ideal world I'd love to see the runtime be able to walk back the entire workflow itself but I may be off in terms of what it's capable of right now.
Thanks @skyao for this great proposal. I agree with @halspang's comments. Besides that, LGTM! |
Overall the proposal LGTM. I agree with both @halspang's and @kaibocai's comments. Should we add another function like Additionally, should a hook be provided so that users can look into the error/exception if thrown before the compensation is triggered i.e. |
I would love to see first-class support for Sagas in Workflows across all languages. FWIW I had the same idea a good few months ago (which was based on the same idea I had of using Sagas for Azure Durable Functions 2 years ago) so it's nice to see that you've arrived in a place along the same line as where I was thinking! |
The proposal LGTM overall, thanks @skyao. Agree with the comments here, specifically (1) not having to register a compensation separately, and (2) if we can offload certain things to runtime, to avoid having to duplicate this to all SDKs and maintain them. |
We'll start with java, and hopefully this proposal will be accepted soon so that the implementation of saga pattern in the dapr java sdk can be released in the dapr v1.13. Then I plan to add python and .net support in next dapr release v1.14. Chris has agreed to do some optimization work in workflow for saga support, as you see, "first-class support". This should all happen soon. |
Sharing my thoughts on the feedback received so far:
@halspang (and others) if I understand the concern correctly, this is about reducing boilerplate. While I sympathize with that, I worry about how this can make the saga pattern overly opinionated and less useful. For example, I don't think there will always be a 1:1 relationship between an activity and its compensation. Furthermore, the exact compensation strategy (which activity to call) and parameters may also need to be different for the same activity executed at different points in the workflow. I would prefer that we start by erring on the side of a loosely coupled design.
I understand the appeal of this idea but have two major concerns:
I'll also point out that the activity registration verbosity problem can be solved in other ways, such as what @mukundansundar has proposed in the Python SDK. In other words, we may be able to treat these concerns separately. My overarching hope is that the saga implementation can be loosely coupled to the overall SDK logic, especially given that it's a brand-new thing that we want to get more real-world feedback on. |
@kaibocai I agree that the ability to have some custom logic is important, which is partly why I strongly hesitate against tightly coupling an activity's execution to its compensation. In the proposed model, which is loosely coupled, developers can execute custom logic (e.g., logging) by catching exceptions, logging in the catch block, and then rethrowing the exception (or use manual compensation) to trigger the compensation. I don't think this approach is opposed to the proposed solution. I think the auto-compensation done by the SDK should be seen as a convenience feature. |
@mukundansundar similar to my previous responses, this isn't necessary with the current proposal because you can control if/when the compensation gets triggered. These hooks that you're suggesting are only required if we tightly couple an activity invocation with its compensation logic, which I'm arguing we shouldn't do because it creates inflexibility (and requires us to complicate the design with behavior customizing hooks, etc.). |
Having thought this through, I agree with @cgillum comments. Tightly coupling the compensation action to the Activity is very opinionated. Let's assume that given a customers use-case, the opinionated model worked fine for a while, but then the user wished to change the Workflow to do something different that doesn't follow the opinionated model, how would they 'break-out' and write a custom compensation just for that one Activity? |
@halspang I'm so sorry to delete a comment by mistaken. This deleted note and the previous one said the same thing, and I replied to them together. |
Signed-off-by: Sky Ao <[email protected]>
Went through the proposal today. @skyao I think it's a great proposal, that would immensely benefit users. I too agree with comments regarding keeping registration of compensation separate than activity registration. @skyao I also wanted to understand a bit more on lines of how compensation will work in different scenarios i.e. An Activity A's compensation may need to be called, if workflow fails at an activity X's level, but Activity A's compensation may NOT be needed to be called if workflow fails at Activity Y's level. |
Currently we have not considered such a complex compensation logic; this judgment of whether to compensate is made across activities, and it requires some global data across activities. Of course, if the activiy x/y failed with output, it can be such a simple judgment: Object output-x = ctx.callActivity("activity-x");
Object output-y = ctx.callActivity("activity-y");
......
Object output-a = ctx.callActivity("activity-a");
if (!output-x.isOK() && output-y.isOK) {
ctx.registerCompensation("compensation-b")
} But if the activiy x/y failed with exceptions, users have to do try/catch to let the workflow continue to execute activity-a when activity-x and activity-y are failed: boolean isXFailed = true;
boolean isYFailed = true;
try {
Object output-x = ctx.callActivity("activity-x");
isXFailed = false;
} catch {...}
try{
Object output-y = ctx.callActivity("activity-y");
isYFailed = false;
} catch {...}
......
Object output-a = ctx.callActivity("activity-a");
if (isXFailed && !isYFailed) {
ctx.registerCompensation("compensation-b")
} I have to say that the flexibility of current proposal is very high, and users can always combine compensation strategies that meet their requirements. From this point of view, I agree with Chris' suggestion: flexibility is more important. |
I think it can be done without another set of pseudo-code. After the java-sdk implementations are merged, I will contact the maintainers of the python sdk and .net sdk to come together and implement saga mode in the python sdk and .net sdk. I will then help the maintainers define the python and .net APIs and implementations directly and update them to this proposal. I expect saga support for python-sdk and .net sdk to be available in dapr v1.14, and I promise I'll support it. |
I agree with @cgillum to keep the activity and compensation loosely coupled to begin with. We can always add opinionated Facades on top if needed. +1 binding |
@artursouza Can we merge this PR now to add this proposal into main branch. |
Would it be helpful to add references to Saga support in other workflow tools? In Temporal, for example, there is a helper class in Java but in other languages there is only sample code: https://temporal.io/blog/saga-pattern-made-easy
(Followed by links to examples for Go, PHP, Python, and TypeScript, in addition to Java.) In Golang there is no try/catch hell and Even in Java, consider whether this should be part of the SDK or whether it should be part of a contrib library or even a DSL built on top of (one of) the Dapr SDKs. Questions: In a workflow-as-code solution such as Dapr, why can't different custom solutions and patterns be accomplished "in code" using the features of the programming language and its libraries? Is Saga such a singularly useful pattern that it should to be built into every Dapr SDK? |
Related issue is #48