Advanced transaction API #10

vigoo · 2024-04-11T13:22:52Z

The initially implemented Rust transaction API works like this:

defines the transaction as an atomic region
if an operation fails (in user code - returns Result::Err), it performs the rollback steps in reverse order
then use set-oplog-index to restart from the beginning of the transaction.

This means that if the worker gets restarted during the transaction for any other reason - which can be a panic, or anything that triggers the worker executor to restart such as hardware failures, deployment, etc - the partially executed transaction's compensation actions will not be executed. There is no way to catch this and execute code - as potentially the executor itself does not exist anymore, and when the worker gets recovered in another executor it just reaches the transaction during replay, and because of the atomic region, decides to rerun it - but there is no existing way exposed by Golem to somehow determine what operations were partially done (although they are in the backlog) from user code (meaning the transaction lib).

We can however take advantage of more Golem host features to solve this, in the following way:

First we need to make sure the operations are serializable. The simplest way to do so is that in the transaction function, we need to list all the operations (the types - they are still parametrized by an input value) in advance. Then we call these "registered" operations within the transaction. This way if the operation's input is serializable, the operation itself is also serializable because we can have a simple numeric ID associated with them in the registration step.
We also generate a transaction UUID before the atomic region starts. This will only run once and we get the same UUID during replay so we have a persistent unique identifier for our transaction.
Before executing an operation, we use Golem's WASI key-value implementation to store the begin (or end, depending on the idempotence mode?) of each operation and its inputs.
We can have the same user-land rollback logic as in the simple version, with the extension that it removes these entries from the key-value store as the operations get rolled back.
Before actually starting the transaction - but after we already have the transaction id - we add a non-persisted region where we check if there are any leftover operations in the key-value store to be rolled back. If we have, we perform these rollback actions (operation id + input serialized so we can reconstruct the operation and call rollback on them) before restarting the transaction.

This way if the execution dies in the middle of a transaction or even during the rollback, next time the worker gets recovered, these leftover rollback actions will always be performed.

The text was updated successfully, but these errors were encountered:

vigoo mentioned this issue Apr 11, 2024

High level Rust API for Golem's transactional host functions #9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced transaction API #10

Advanced transaction API #10

vigoo commented Apr 11, 2024

Advanced transaction API #10

Advanced transaction API #10

Comments

vigoo commented Apr 11, 2024