You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 30, 2023. It is now read-only.
Hi,
we are using OrbitDb for developing a p2p chat application.
Some time ago we noticed that once in a while users are lacking older messages ("message" - entry in a EventStore). It happens rarely but it already happened at least few times and we were sure that it wasn't a connection issue simply because new messages were arriving and could be sent with no problem.
Right now we are relying on replicate.progress event to send newly received messages to frontend. After some intensive testing I managed to get to the broken state (missing one message) and gather some logs.
(Notice missing "I received a message but Windows did not start replicating missing messages. Will it trigger now?" on the left side).
These are the logs from the application with a broken state. They are a bit messy because I was logging the db snapshot on every 'replicate.progress' to see how oplog is changing. app1MissingMessage.log
This is the final snapshot that proves that the "missing" entry is in the local db, information about receiving it just wasn't propagated: app1MissingMessagesFinalSnapshot.log
Looking at the logs of the last 3 messages I noticed that the Replicator received "Yes it did trigger" message before "I received a message but Windows did not start replicating missing messages. Will it trigger now?". I am not sure if this matters but after "Yes it did trigger" the replicationStatus wasn't recalculated properly thus replicate.progress didn't happen:
entry.clock.time: 31
onReplicationProgress: I reconnected at 22:49. Will sending a message trigger replication?
onReplicationProgress -> (this._oplog.length + 1 | 43), (this.replicationStatus.progress | 43), {previousProgress | 42}, (this.replicationStatus.max | 44), (previousMax | 44)
entry.clock.time: 44
onReplicationProgress: Yes it did trigger
onReplicationProgress -> (this._oplog.length + 1 | 43), (this.replicationStatus.progress | 44), {previousProgress | 43}, (this.replicationStatus.max | 44), (previousMax | 44)
entry.clock.time: 43
onReplicationProgress: I received a message but Windows did not start replicating missing messages. Will it trigger now?
onReplicationProgress -> (this._oplog.length + 1 | 43), (this.replicationStatus.progress | 44), {previousProgress | 44}, (this.replicationStatus.max | 44), (previousMax | 44)
Unfortunatelly I don't have a working test yet because the path for reproducing the problem is a bit random. Opening and closing apps (aka peers) in some order seems to do the trick. I'll provide a test as soon as I create one.
Do you have any idea what could've happened here?
The text was updated successfully, but these errors were encountered:
Sorry I don't have a good idea of what is going on here. It's mostly good to hear this behavior is rare, as it's less of a blocker that way. The replicator needs to be rewritten in my opinion which I'm in the process of doing as part of orbitdb.
Well it's rare but it's not that rare. There are just a handful of us testing the orbitdb-based application and we've seen it several times already in our casual use of the app.
And "everyone converges to the same state" is the core feature of a CRDT, so if OrbitDB doesn't do this reliably and deterministically that's a big deal.
Are there any options beyond "wait for rewrite"? For example:
If this is a regression, can OrbitDB revert the change that brought it in?
Was it @haadcode that worked on the replicator refactor? Would they have any ideas of what might be going wrong?
Can we invest in some testing infrastructure around this so that, at the very least, when the rewrite is ready we can be confident that the problem is no longer happening? This seems like the most important thing.
Also, when should we expect the work you're doing to be ready?
orbit-db: 0.28.3
orbit-db-store: 4.3.3
Hi,
we are using OrbitDb for developing a p2p chat application.
Some time ago we noticed that once in a while users are lacking older messages ("message" - entry in a EventStore). It happens rarely but it already happened at least few times and we were sure that it wasn't a connection issue simply because new messages were arriving and could be sent with no problem.
Right now we are relying on
replicate.progress
event to send newly received messages to frontend. After some intensive testing I managed to get to the broken state (missing one message) and gather some logs.(Notice missing "I received a message but Windows did not start replicating missing messages. Will it trigger now?" on the left side).
What happened was that
replicate.progress
event didn't fire for this particular message because none of the conditions inonReplicationProgress
(https://github.com/orbitdb/orbit-db-store/blob/main/src/Store.js#L98) were met.These are the logs from the application with a broken state. They are a bit messy because I was logging the db snapshot on every 'replicate.progress' to see how oplog is changing.
app1MissingMessage.log
This is the final snapshot that proves that the "missing" entry is in the local db, information about receiving it just wasn't propagated:
app1MissingMessagesFinalSnapshot.log
Looking at the logs of the last 3 messages I noticed that the Replicator received "Yes it did trigger" message before "I received a message but Windows did not start replicating missing messages. Will it trigger now?". I am not sure if this matters but after "Yes it did trigger" the
replicationStatus
wasn't recalculated properly thusreplicate.progress
didn't happen:Unfortunatelly I don't have a working test yet because the path for reproducing the problem is a bit random. Opening and closing apps (aka peers) in some order seems to do the trick. I'll provide a test as soon as I create one.
Do you have any idea what could've happened here?
The text was updated successfully, but these errors were encountered: