-
-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dendrite 0.6.2 fails to sync/federate #2150
Comments
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). Can you please see if this behaviour is any better as of commit a2b4777? |
This comment was originally posted by @grisu48 at matrix-org/dendrite#2150 (comment). Hi, I'm seeing the same issue. I can send messages within the instance, but anything from outside doesn't come through. I'm seeing a lot of the following errors in my log:
@neilalexander I tried to build the container from commit a2b4777 however the issue persists and the above stated error messages are still flowing. |
This comment was originally posted by @Undef-a at matrix-org/dendrite#2150 (comment). with a2b4777 federation wasn't perfect, but was far better than 0.6.2 (where just about everything was locked up). |
This comment was originally posted by @grisu48 at matrix-org/dendrite#2150 (comment). I just updated to 0.6.3 and federation is still broken. When grepping for |
This comment was originally posted by @stintel at matrix-org/dendrite#2150 (comment). Broken for me as well in 0.6.3. Downgrading to 0.6.0 doesn't help. Downgrading further seems to require a database rollback, which is not an option as these backups have expired already. |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). Can you please try the latest |
This comment was originally posted by @grisu48 at matrix-org/dendrite#2150 (comment). I tried a new docker container with matrix-org/dendrite@5106cc8 and the issue is still present. In addition now also the inter-instance communication is broken. Also: when logging in via my Android phone the app was performing a initial sync, despite me being logged in before. When grepping for I'm attaching here the output of |
This comment was originally posted by @Undef-a at matrix-org/dendrite#2150 (comment). Using 5106cc8 helps a little. Now rooms all correctly sync up immediately after restarting dendrite. Over the next 10-30 minutes the problem rooms will start to drop events. |
This comment was originally posted by @stintel at matrix-org/dendrite#2150 (comment). Unfortunately 5106cc8 doesn't help at all here. I have not a single message after February 8 in any of the rooms I'm in. At that time I was running 0.6 since January 29. So for me the problem started with 0.6, but only ~10 days after upgrading to it. |
This comment was originally posted by @grisu48 at matrix-org/dendrite#2150 (comment). Short update: matrix-org/dendrite@5106cc8 seems to help. On Sunday after deploying my instance was still silent, however today I see that some of the noisier channels are finally filling again with messages from Sunday onwards. I still see a message hole between the date of updating to 0.6 and Sunday. I keep monitoring this issue, but I'm mildly optimistic that matrix-org/dendrite@5106cc8 might help to resolve the issue or at least help. |
This comment was originally posted by @alistair23 at matrix-org/dendrite#2150 (comment). I am having the same issue, lots of |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). FWIW Out of curiosity, are you all running the internal NATS deployment built into Dendrite or standalone NATS Server? If any of you are running a standalone NATS Server, which options are you running with? |
This comment was originally posted by @alistair23 at matrix-org/dendrite#2150 (comment). I deleted the jetstream dir (matrix-org/dendrite#2181) and now it appears to be working after waiting awhile. I also bumped my max connections up to try and avoid matrix-org/dendrite#2173 I'm using the NATS build into Dendrite |
This comment was originally posted by @grisu48 at matrix-org/dendrite#2150 (comment). I'm also using the NATS build directly into Dendrite. I am using the same config as provided in the project yaml example: # Configuration for NATS JetStream
jetstream:
# A list of NATS Server addresses to connect to. If none are specified, an
# internal NATS server will be started automatically when running Dendrite
# in monolith mode. It is required to specify the address of at least one
# NATS Server node if running in polylith mode.
#addresses:
# - jetstream:4222
# Keep all NATS streams in memory, rather than persisting it to the storage
# path below. This option is present primarily for integration testing and
# should not be used on a real world Dendrite deployment.
in_memory: false
# Persistent directory to store JetStream streams in. This directory
# should be preserved across Dendrite restarts.
storage_path: ./
# The prefix to use for stream names for this homeserver - really only
# useful if running more than one Dendrite on the same NATS deployment.
topic_prefix: Dendrite
# Configuration for Prometheus metric collection. |
This comment was originally posted by @grisu48 at matrix-org/dendrite#2150 (comment). @neilalexander Do you want me to file another issue for the |
This comment was originally posted by @stintel at matrix-org/dendrite#2150 (comment).
Thanks, that seems to have helped. Also using built-in. |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). OK, so to understand what's really going on, I could use a goroutine trace and a profile from Dendrites that are experiencing these issues. To do this, you need to start Dendrite with the Then the next time you run into problems, capture the following profiles:
... and then upload all three files along with the commit ID that you are running — they don't contain configuration or anything sensitive (apart from possibly the folder names that Dendrite was built in) so should be safe to share. The two |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment).
A lot of those issues will be genuine connection errors or bad keys so I wouldn't worry about those log lines unless you are having problems with E2EE specifically — in that case best to open a separate issue. |
This comment was originally posted by @imyxh at matrix-org/dendrite#2150 (comment). I think I'm seeing this too. Deleting the jetstream folder and restarting dendrite does fix it but only temporarily and after a while my Element will stop connecting properly again. The log is surprisingly quiet for me though, other than the "context canceled" errors that occur after my client gives up, and the response.WriteHeader messages in #2123. |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). @imyxh Please follow the instructions a couple posts up and if you can supply profiles from the next time it happens, that’d be amazing. Deleting the entire JetStream folder is not ideal and doing so is a very good way for downstream components to get in an out-of-sync state with the roomserver, so I can’t recommend that as a fix. A much much safer approach if absolutely necessary is to delete just the |
This comment was originally posted by @imyxh at matrix-org/dendrite#2150 (comment).
Whoops, skipped over that. Here they are! https://upload.disroot.org/r/9QJS70Hn#Kbx45aT6u79C2hAcB8D6ReproE37SPN1s6aZxQvD90U= |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). @imyxh Thanks for these, the profiles are extremely useful. Can you please just confirm for me which commit ID of Dendrite you are running? I’m seeing a pattern in the goroutine trace — there are a few roomserver workers that are all blocked on the select query in Can you please also get a few more details for me:
Thanks! |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). @imyxh Actually, looking more closely, I suspect your specific issue may have been fixed already in #2178 — it’s just that it hasn’t made its way into a release yet. I can see this because your goroutine profiles claim to be stuck in One way to find out is to update to commit 5106cc8 or anything on the |
This comment was originally posted by @imyxh at matrix-org/dendrite#2150 (comment).
I just tested from latest git main and indeed, there is no problem :P Thanks for all your work! |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). @imyxh Glad to hear that’s helped — if you run into any more problems, please capture and chuck up some new profiles and we can look again. :-) You’ve also got headroom of 20 unused database connections so you could increase the roomserver’s |
This comment was originally posted by @grisu48 at matrix-org/dendrite#2150 (comment). Will there be a 0.6.4 release soon with the above mentioned PRs in it so we can switch from a custom commit back to the release channel? 🙂 |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). @grisu48 Yep, sometime this week. |
This comment was originally posted by @alistair23 at matrix-org/dendrite#2150 (comment). Using commit 002429c I am unable to sign in with new element sessions, I see some rooms but the data isn't in sync and on existing sessions Federated messages aren't working. |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). @alistair23 What happens if you try to send outbound messages? |
This comment was originally posted by @alistair23 at matrix-org/dendrite#2150 (comment). If I send a message in an E2EE room on an existing session it seems to work, but the recipient can't decrypt it. On a new session it also seems to send, but it is sent unencrypted |
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). There have been a number of improvements in Dendrite 0.6.4 both for the original issue and for E2EE, anyone who is having outstanding issues please test on the latest version and let me know how you get on. |
This comment was originally posted by @grisu48 at matrix-org/dendrite#2150 (comment). Hi! I've updated now to dendrite 0.6.4 using
Not sure if this is related to this issue or if this is a new one though. It might be related to matrix-org/dendrite#2222. In the logs I don't find anything really pointing at something, This is when I send a message to a room on my instance:
I see a lot of those "Failed to retrieve any keys" warnings, but I don't know if they are related or not. |
This comment was originally posted by @stintel at matrix-org/dendrite#2150 (comment). After updating to 002429c, I wasn't receiving any messages. Deleting just the jetstream/$G/streams/DendriteInputRoomEvent directory didn't help, deleting the complete jetstream directory did. |
This comment was originally posted by @ElDifinitivo at matrix-org/dendrite#2150 (comment). Continuing to experience this issue with docker 0.6.4
Got the former point, but the latter is iffy. Intra-instance is working for me, but federation is incredibly spotty; some messages may send but most do not. And the logs are quiet for me as well.
I also deleted the jetstream directory and it removed the
|
This comment was originally posted by @neilalexander at matrix-org/dendrite#2150 (comment). @ElDifinitivo Built-in NATS or a standalone NATS Server? |
This comment was originally posted by @ElDifinitivo at matrix-org/dendrite#2150 (comment). @neilalexander Built-in monolith |
This comment was originally posted by @alistair23 at matrix-org/dendrite#2150 (comment). Any update? Now I'm seeing messages not going out |
This comment was originally posted by @grisu48 at matrix-org/dendrite#2150 (comment). I could finally solve this issue by deleting the old @alistair23 maybe that's also worth a shot for you? I just renamed the jetstream directory, and once everything worked, dumped it completely. |
This comment was originally posted by @Undef-a at matrix-org/dendrite#2150 (comment). I've also had a similar experience. Upgrading to 0.6.5 jetstream broke, unable to parse something in that directory. After removing the directory messages are nearly always showing up correctly for the past few days. However, #2142 (which I had thought fixed, then a symptom of this issue) is back. |
This comment was originally posted by @kegsay at matrix-org/dendrite#2150 (comment). Please open a new issue if this is still a problem. |
This issue was originally created by @Undef-a at matrix-org/dendrite#2150.
Background information
Description
Dendrite fails to receive new events for any room and fails to sync existing events to some clients as of version 0.6.2. In the clients, is shown by either frozen rooms and disconnection messages (element) or never finishing the initial sync (Fluffychat and Hydrogen). Rolling back to 0.6.0 resolves the issue.
The following logs may be relevant:
NOTE: user data from the logs has been stripped.
The text was updated successfully, but these errors were encountered: