Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC1719: olm session unwedging #1719

Merged
merged 10 commits into from
Jun 4, 2019
59 changes: 59 additions & 0 deletions proposals/1719-olm_unwedging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Olm unwedging

Olm sessions sometimes get out of sync, resulting in undecryptable messages.
richvdh marked this conversation as resolved.
Show resolved Hide resolved
This can happen for several reasons. For example, if a user restores their
client state from a backup, the client will be using an old ratchet state
([riot-web#3822](https://github.com/vector-im/riot-web/issues/3822)). Or a
client might expire a one-time key that another client is trying to use
([riot-web#3309](https://github.com/vector-im/riot-web/issues/3309)). This
proposal documents a method for devices to create a new session to replace the
broken session.

## Proposal

When a device receives an olm-encrypted message that it cannot decrypt, it
should assume that the olm session has become corrupted and create a new olm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is already working in practice - but sometimes I receive undecryptable messages that I can decrypt after a little why. Possibly this MSC is the reason it does eventually become decrypted? But I was always chalking it up to server lag. If the latter is the case, shouldn't we wait a little bit before attempting to start another session? Or does rate limiting make this all ok anyways?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With megolm, if you receive a message that you can't decrypt, then it could be that the key will come in later, and then you'll be able to decrypt it. With olm, if you receive a message that you can't decrypt, then there's no way to recover, and waiting for a while won't help.

session to replace it. It should then send a dummy message, using that
session, to the other party in order to inform them of the new session. To
send a dummy message, clients may send an event with type `m.dummy`, and with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can m.dummy only be sent in the context of unwedging? If not, how should the client handle it? By ignoring the event the client opens itself up to problems similar to https://github.com/vector-im/riot-web/issues/3261

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably doesn't necessarily need to be in the context of unwedging, but I can't really think of any other reason a client would send a dummy message, and I think we could just define m.dummy as "Ignore this message". This is a to_device message, and won't be displayed in any way to the user, so I don't think it will introduce any problems in the timeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought this was a timeline event. Would be good to clarify that it is a to_device event (if it isn't already - I'm willing to believe I just missed that).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of hidden in the fact that the message is olm-encrypted. I guess that technically, it could be a timeline event, if you used olm for your room encryption rather than megolm, but ... let's just say that people won't do that. ;)

empty contents.

In order to avoid creating too many extra sessions, a client should rate-limit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a client DoS another client by ignoring this rate limit? Does the server protect against this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A client could drain another client's one-time keys, preventing others from starting sessions. There isn't really a good way of fixing that. Note that this MSC doesn't change what a client is able to do, so any DoS that one client can perform with this MSC could also be done without the MSC.

the number of new sessions it creates per device that it receives a message
from; the client should not create a new session with another device if it has
already created one for that given device in the past 1 hour.

Clients may wish to take steps to mitigate the loss of the undecryptable
messages. For example, megolm sessions that were sent using the old session
would have been lost, so the client can send
[`m.room_key_request`](https://matrix.org/docs/spec/client_server/latest.html#m-room-key-request)
messages to re-request any megolm sessions that it is unable to decrypt.

The spec currently says, "If a client has multiple sessions established with
another device, it should use the session from which it last received a
message." (the last paragraph of the [`m.olm.v1.curve25519-aes-sha2`
section](https://matrix.org/docs/spec/client_server/r0.4.0.html#m-olm-v1-curve25519-aes-sha2)).
When comparing the time of the last received message for each session, the
client should only consider messages that were successfully decrypted,
and for sessions that have never received a message, it should use the creation
time of the session. The spec will be changed to read:

> If a client has multiple sessions established with another device, it should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds like it fixes element-hq/element-web#2330, and maybe element-hq/element-web#4011 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 4011 was an implementation issue that was fixed by matrix-org/matrix-js-sdk#857. But this does fix 2330 -- I had been waiting for more testing before closing it.

> use the session from which it last received and successfully decrypted a
> message. For these purposes, a session that has not received any messages
> should use its creation time as the time that it last received a message.

## Tradeoffs

## Potential issues

## Security considerations

An attacker could use this to create a new session on a device that they are
able to read. However, this would require the attacker to have compromised the
device's keys.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about the crypto here, but this feels like it could weaken security? If an attacker has compromised the device keys can they restart a session easily today? Do we want to warn the user this has happened?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, clients would use the session with the smallest session ID, so an attacker could still create a new session to be used. They'd might just need to try a few times before they got a session ID that was small enough.

I should also note that Signal basically does the same thing as this proposal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signal's documentation for this is https://signal.org/docs/specifications/sesame/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I guess

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Other people are doing it it must be fine (TM)" :p


## Conclusion

This proposal outlines how wedged olm sessions can be replaced by a new
session.