Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DirectConnection causing document state corruption #832

Closed
raimohanska opened this issue Jun 13, 2024 · 3 comments · Fixed by #834
Closed

DirectConnection causing document state corruption #832

raimohanska opened this issue Jun 13, 2024 · 3 comments · Fixed by #834
Assignees
Labels
bug Something isn't working

Comments

@raimohanska
Copy link

Description

If you use a DirectConnection when there are no WebSocket connections for a document, the WebSocket connections established during the next 2 seconds (or debounce time) will end up using an orphaned document and will not sync changes properly with clients that establish connections after that 2 second period. The problem lies in that storeDocumentHooks is called directly after closing the DirectConnection, but also in a debounced manner a bit later. Details follow.

Steps to reproduce the bug

  1. Start from a situation where there are no connections for document doc1.
  2. Establish a direct connection on the server, transact something and disconnect the connection.
  3. At this stage storeDocumentHooks has now beed called twice (once for the (transact and once for the disconnect). As the result of the latter, the unloadDocument method has been called and the doc1 has beed removed from this.documents
  4. All fine, except that there's still a debounced call for storeDocumentHooks to be performed in 2 seconds or so.
  5. Let's say a WebSocket client connects at this stage. Because the doc1 is not loaded yet, it will now be loaded
  6. After 2 seconds, the debounced storeDocumentHooks call kicks in. This call is bound to the earlier Y.Doc instance that no longer is referenced in this.documents (there's no a new copy of doc1 loaded). This call results into another call to unloadDocument which, on this line removed the document from this.documents. Unfortunately, it ends up removing the new document instance, which is now referenced by the WebSocket client which connected in step 5.
  7. Now the WebSocket client is connected to an orphaned document not in this.documents. Any client connecting from now on, will be assigned to a new document instance and the clients will not see changes from each other.

Expected behavior

When the DirectConnection is closed and there are no more clients for doc1, we should not unload the document when there is still a debounced storeDocumentHooks call left to be performed. Or alternatively, at least make sure that the debounced call does not end up removing the newer, active document from this.documents.

In my project, I made a quick patch for this line to additionally check that the document currently in this.docs is actually same document that's now being unloaded.

The patch solution is suboptimal though at least in the sense that two copies of the same document will exist in server memory for a time. I'd prolly rather have it so that the document accessed by DirectConnection would be subject to the same lifecycle as with WebSocket connections - that is, to be unloaded only after the normal debounce period. Yet, I'm not intimately familiar with the server design, so my hunch may be misguided.

@raimohanska raimohanska added the bug Something isn't working label Jun 13, 2024
janthurau added a commit that referenced this issue Jun 13, 2024
@janthurau janthurau mentioned this issue Jun 13, 2024
@janthurau
Copy link
Collaborator

Thanks a lot for this @raimohanska!

I think I've found a way (I'm just using the debouncer in storeDocumentHooks), but will need to do some testing. I haven't reproduced your issue yet, but it makes sense. I'm spending some time now testing / reproducing it.

janthurau added a commit that referenced this issue Jun 13, 2024
@janthurau
Copy link
Collaborator

aha! This only happens when you start a transaction and pass an origin, otherwise it works (L404 in Hocuspocus.ts is the guard).

I've pushed a fix in #834 and added a reproduction test case. Can you verify if that works for you? 🙏

@raimohanska
Copy link
Author

Great, thanks! I'll have a look tomorrow and see if I can verify the fix.

Btw the redis-specific check on L404 looks quite hacky 😂

janthurau added a commit that referenced this issue Jun 14, 2024
* fixes #832

* adds reproduction test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants