Skip to content
Alexey Rusakov edited this page Oct 8, 2022 · 3 revisions

Below notes are for library developers. Client authors using the library don't need to know anything from this except that (in concern with section 1 below) under no circumstances they should update account data on the server in bypass of the library because it's a sure way to have data lost in race conditions.

1. Handling direct chat events

Account data events, and especially m.direct, are prone to race conditions when being updated. The problem with the current design in Matrix is that there's no monotonic function to update account data between the client(s) and the server, which doesn't allow to build a CRDT around them. Moreover, the server code doesn't care about conflicts, expecting changes to propagate instantaneously. If a user does something locally, the client has to be as careful as possible not to overlap with updates coming from the server; if two change sets (one coming from the client and another being synced from the server) cross ways mid-air, the server will just overwrite its account data event with the one that has come from the client.

This is not too much of a problem with m.tag events, since change (almost) always fly from the client to the server, and the reverse direction only happens during initial sync. There's an edge case when an end user is logged in on two clients and uses them interchangeably; but even then it can be (more or less) safely assumed that the end user won't shoot themself in the foot and update tags of the same room in two clients almost at once, in different ways. It's a different situation with m.direct events though. Each m.direct event synced from the server and each m.direct update request going to the server have the entire map of the user's direct chats. Compared to room-specific m.tag, it's a bit easier to update the direct chat status of two different rooms in two client applications, and end up with only one of the changes being persisted on the server due to race conditions caused by network or the server delays. Moreover, there's a much more possible race condition when someone else invites the user to a direct chat, while the user also updates a direct chat status of another room: according to The Spec, the invitee's client should add the inviting room to direct chats when it receives an invitation with is_direct flag set in its content. This is a sufficient ground for the server occasionally overwriting m.direct updates.

In order to at least somewhat mitigate that, the library sends m.direct updates immediately, and only, upon finishing the sync processing, merging the changes from the client and from the server along the way. Each Connection maintains a pair of multihash maps somewhat resembling a 2P-Set that accumulates local changes (see dcLocalAdditions and dcLocalRemovals in Connection::Private), consisting of [user, direct chat room] pairs. The multihashes are cleared every time when the merged state is sent to the server (if the server has accumulated a change between delivery on the sync and receiving the merged state, only this change will be lost). A multihash pair is also used to internally (within the client application - using directChatsListChanged() signal) propagate local and remote changes on as-they-arrive basis, removing the trouble of calculating the change in m.direct state from client authors and simplifying add/remove logic in UI.

When the sync data arrive, (see Connection::onSyncSuccess()), there are two possible cases:

  • If the server didn't receive changes from other clients of the same user account (the usual case), the current local state is immediately sent to the server, and the 2P-Set is cleared.
  • If the server has changes in m.direct state (rarer case), it sends the event in account_data. In that case the algorithm below is applied:
    • In addition to the local 2P-Set, stored in dcLocalAdditions/dcLocalRemovals, the remote 2P-Set is calculated and the local state is updated as follows:
      remoteRemovals := localState \ (remoteState ∪ dcLocalAdditions)
      remoteAdditions := (remoteState \ dcLocalRemovals) \ localState
      localState := localState \ remoteRemovals ∪ remoteAdditions
      
      This logic assumes that additions win over removals (unlike classic 2P-Sets) to avoid the tombstone semantics of removals in 2P-Sets that is undesirable here.
    • The library sends the merged localState to the server.
    • The client application (already aware about local changes) is notified about the remote changes using this 2P-Set.

2. Saving unread event counts

In order to still keep around the information about unread messages without storing the local timeline in the cache, the number of unread messages since the fully read marker (exposed via Room::unreadCount() or, since v0.7, via Room::partiallyReadStats().notableCount) is saved to the cache. Since the cache is saved on an existing or empty timeline but is always read into an empty timeline, one should be careful with the arising room state discrepancy. With c being the count of notable (i.e. used for unread counting logic defined by MSC2654) events, the possible cases are as follows:

  1. Local timeline empty when saving: restoring as is in all cases
    1. c == -1: "all read"
    2. c >= 0: "at least c unread"
  2. Local timeline non-empty when saving:
    1. Read marker outside the timeline:
      1. c == -1: invalid case, should be corrected to c = 0
      2. c >= 0: "at least c unread" (even for 0), restores to state 1ii
    2. Read marker within the timeline:
      1. c == -1: "all read", restores to state 1i
      2. c > 0: "exactly c unread", restores to state 1ii
      3. c == 0: invalid case, should be corrected to c = -1

Note: libQuotient 0.6.x only counted unread events since the fully read marker (and also tried to pin the read receipt to it in order to provide interoperability with the rest of Matrix without having full support for two markers). The definition of unread events used in 0.6.x was therefore different from that used by, e.g., Element that counted unread events since the m.read receipt instead. With libQuotient 0.7 properly supporting both types of markers, there are also two ways to calculate unread events - since m.fully_read marker and since the last read receipt (m.read). To distinguish between the two, the former (since m.fully_read) range of events is now referred to as partially read events while the latter (since m.read) is called unread events - this last one in line with the spec and most of the Matrix ecosystem. The logic described in this section specifically applies to partially read events (pre-0.7 unread events); the counter of "new" unread events is not saved in the cache and therefore has no special -1 value.