-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions for MLS state machine #303
Suggestions for MLS state machine #303
Conversation
…ate-machine-suggestions
AFAIK, MLS lets you define the logic for which proposals are accepted or rejected. Another possibility could be to mandate that each client will only commit its own proposals. OpenMLS has a remove_pending_proposal method that we could use instead of committing pending proposals |
1. Sequentially process each payload. For each payload, update the `last_message_timestamp_ns` and any corresponding database writes for that payload in a single transaction | ||
1. When the sync is complete, release the lock by setting `lock_until_ns` to 0 | ||
|
||
This flow will be similar regardless of if the sync happens via a poll-based or subscription-based mechanism. For a subscription-based mechanism, the lock will be obtained at the start of the subscription, and extended on a heartbeat time interval, until the subscription is closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the absence of really robust subscriptions (which I think is going to be quite hard to implement for V1) my recommendation would be to not adjust any locks on the subscription and allow state syncs to happen in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If subscriptions are not robust, I feel like it causes a lot of downstream problems.
- If a commit message is lost, subsequent payloads are unreadable. You can trigger a pull at this point, but it's added complexity.
- If we don't lock syncs while subscriptions are in progress, then we have concurrent processing of the same payloads, meaning one side or the other will fail. It becomes unclear if failures are due to bad payloads/missing epoch messages, or if they are due to parallel processing.
- It's not the best dev experience for consumers of the SDK.
I have some suggestions:
- We could make subscriptions more robust. We could pass the
last_synced_payload_timestamp
to the server when initiating subscriptions. Alternatively, we could have the server record the previous payload timestamp on each payload, so that the client can detect when it is missing payloads and re-sync. - We could omit subscriptions from the initial MLS milestones and fix them up in later milestones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could omit subscriptions from the initial MLS milestones and fix them up in later milestones.
We can talk to developers and see how much of a deal-breaker that would be. They'd also have to be OK with no push notifications. Together, that sounds like a tough sell, but I could be wrong.
We could make subscriptions more robust. We could pass the last_synced_payload_timestamp to the server when initiating subscriptions. Alternatively, we could have the server record the previous payload timestamp on each payload, so that the client can detect when it is missing payloads and re-sync.
Even if we had the most robust streaming APIs, we also need to somehow deal with push notifications which are even less reliable and have less flexibility. The last_synced_payload_timestamp
isn't really an option for push. Having the server record the previous payload timestamp on each payload would be possible in our StreamAllMessages
API which powers push. Would require some changes to our back-end to include it in the message that gets gossiped in Waku. It also requires a lookup on each publish, which is probably an acceptable performance cost.
All options feel like some variation of: "some streamed messages are just a reminder to query the topic and sync from the last timestamp of our contiguous message history". There are degrees of this.
- Every subscription message or push notification just tells the client to sync via the query API. We don't even look at the payloads.
- Application messages can be processed directly from a stream/push notification if you are in the correct epoch. We inspect the payload and if it is a Commit/Proposal we abort processing the streamed message (and roll back the database changes that deletes the decryption keys) and sync from our last known timestamp via the query API
- Any streamed message or push notification where the
previous_payload_timestamp
lines up with our local state gets processed immediately. If we haven't processed the message with theprevious_payload_timestamp
we do a sync from the network from the last synced payload timestamp.
The third option likely gives us the fewest queries to the network depending on the % of out-of-order messages received via streaming. But it is a bit of a pain to roll out. We'd need to not just update our nodes and Envelope proto schema but also the example-notification-server. And anyone using MLS would have to update their notification receiving code to handle the new field. Maybe it's worth the cost, but we should take the cost into consideration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the third option. I hadn't considered adding the previous_payload_timestamp
to all our streamed envelopes.
Given that all options look like option 1 in some cases, I wonder if we start with that to unblock everything and give us streaming methods quite easily. Then we can layer on option 2 or 3 depending on how we feel about the scope of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really good framing, thanks for writing it up. Okay, this makes sense to me!
That would allow us to check the box of "supporting proposals" in the short term, without actually having to support them. We'd have to think through what it would look like to change this later. Maybe having a single member of the group on a newer version that didn't remove the proposals, but instead created a commit, would be fine. Would just require a bit of investigation. |
The simplest is to always make processing sequential. Gives us the same guarantees no matter where the message came from. Push notifications and Subscriptions that contain a commit message would just trigger a sync using the Query API that would ensure all previous commits were applied first. Application messages received via Push or Subscription are probably safe to attempt processing immediately (although they may fail if you had somehow missed a prior commit). Failed processing could be remedied with a sync-by-query. It feels like anything else breaks the total ordering assumption baked into MLS. We are going to need to design a solution to the total ordering problem for decentralization, but we've already decided to table that for a later iteration of the protocol. |
I think I understand - we can't process the same payload twice, so is the suggestion here to have an application sync trigger commit processing first? But what happens if you have an application sync in progress at the time that you make a commit, or halfway through the commit process? |
We could have some sort of locking mechanism that prevents this. These updates should be relatively fast and simultaneous syncs should be infrequent. But I don't think we need that. If a message fails to decrypt because it's already been processed and the keys have been discarded, we just move on to the next message. As long as both processes are going through messages sequentially and talking to the same DB, everything should just work right? The system is going to need to be very resilient to decryption failures since anyone can publish junk to the group topic, plus the fact that your own messages are unreadable. |
This was a really helpful discussion. I'll try to succinctly describe the various edge cases we discussed, at least so that I understand how to implement the thing, but with that in mind I think I'm happy with this! |
Will try to keep these suggestion PR's small, for this one:
post_commit_data
field on intent rather than separate tableOther things worth considering: