Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added proposal for finalizing distributed lock building block #43

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ItalyPaleAle
Copy link
Contributor

The distributed lock building block (dapr/dapr#3549) is currently in alpha and only partially-implemented. This is a proposal for completing the design and guiding the implementation to eventually stabilize the building block.

@yaron2
Copy link
Member

yaron2 commented Oct 30, 2023

Overall LGTM with a preference for option 1 + callback API for lost locks.

It would be great if we could utilize existing etags to create a generic mechanism that works across all etag supported state stores, as it would provide the following major benefits:

  1. Eliminates component specific code maintenance in components-contrib
  2. Eliminates maintenance of test infrastructure including conformance and certification tests for different components
  3. Provides the greatest selection of components to users out of the box
  4. Dramatically reduces chances of bugs by relying on a single implementation

@ItalyPaleAle
Copy link
Contributor Author

a preference for option 1 + callback API for lost locks.

May I ask the reasoning for preferring option 1?

It would be great if we could utilize existing etags to create a generic mechanism that works across all etag supported state stores

This is certainly possible.

My proposal here was to use actors. They offer a higher-level abstraction over just state stores, with better concurrency control (not optimistic, reducing the likelihood of conflicts that need to be retried), and support for things like reminders to perform a cleanup. Also, dogfooding is always good :)

@berndverst
Copy link
Member

@ItalyPaleAle can you also copy the proposal text directly into the description here? Makes it easier to read :)

@ItalyPaleAle
Copy link
Contributor Author

@ItalyPaleAle can you also copy the proposal text directly into the description here? Makes it easier to read :)

I would rather not, as then they'd need to be kept in sync. I also prefer if people leave comments in the code.

Copy link
Contributor

@JoshVanL JoshVanL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some comments from me.

I'm more in favor of option 2- I think from a programing perspective this feels a lot more natural as you are acquiring a lock "object" in code which you can pass around etc. Are we concerned about clients not supporting HTTP/2? Are there any SDKs which don't?


message LockRequest {
// When set to true, if the resource is already locked, the method returns right away with an error, and does not wait for the resource to become available.
bool immediate = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't immediately (pun not intended) clear to me what this flag did without reading the description. Could we use some variation of wait which might be more natural? If we want to keep the default of no wait, then perhaps noWait?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

0011-B-Distributed-Lock-building-block.md Outdated Show resolved Hide resolved
Comment on lines +112 to +114
// ID of the resource whose lock needs to be renewed.
// This is an arbitrary value supplied by the user.
ResourceID string `json:"resourceId"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken, isn't LockID all we need here?

Copy link
Contributor Author

@ItalyPaleAle ItalyPaleAle Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the component we definitely need the resource ID. The component itself won't maintain the state - the runtime will

Comment on lines +112 to +114
// ID of the resource whose lock needs to be renewed.
// This is an arbitrary value supplied by the user.
ResourceID string `json:"resourceId"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are a bit inconsistent in the Dapr project about Id vs ID (even in this doc), and even between variable names and their json field encodings. I think we should try and be consistent moving forward.. I am firmly in the ID camp 🙂
FWIW, Kubernetes also goes with ID.

Comment on lines +124 to +125
// ID of the resource that is locked.
ResourceID string `json:"resourceId"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again re: needed because we have the LockID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, the component does need the resource ID too. The state will be kept in the runtime, not component.

- Dapr renews locks automatically in background. Apps do not need to invoke a method to renew a lock explicitly. The TTL sent in requests is only used to control how frequently locks are to be renewed (lower TTLs cause locks to be renewed more frequently, but allow for quicker detection of failed hosts).
- If the app becomes unhealthy (when app healthchecks are enabled) or if the sidecar crashes, locks are not renewed and are left to expire when the TTL ends.

> About force unlocking: this is implemented primarily to recover from situations where the lock was owned by an app that is known to have crashed. This is a potentially dangerous activity, as the previous owner of the lock is not informed (at least, not immediately) that the lock has been taken away, at least not until it tries to renew or relinquish the lock.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite torn as to whether we should include this. I can see the use case, however the ramifications of two apps thinking they both have a lock could potentially be huge and is giving developers a big foot gun. Perhaps we hold off on exposing this API and reevaluating if it comes up as a feature request? Not being implemented by all components is also a kicker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been on the fence myself the entire time about this.

Proposals often indicate the "desired state", and they don't have to be implemented in full at the beginning. We can always create the API in the component and then expose it later.


message LockServerStream {
oneof lock_server_stream {
LockPing ping = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need ping? Surely the presence of the stream means that the lock is still active? Or does this mean if the network has failed downstream, worst case scenario the lock is improperly locked for is max ping interval time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for a ping is because it allows the server to send a message over the wire. Without that, if the connection drops, it's not immediately clear to the server that the connection is dropped.

@mikeee mikeee mentioned this pull request Mar 19, 2024
43 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants