You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
On our MAS instance, every time we attempt to onboard a new user, they are prevented from completing login and shown error request failed with status 404 Not Found: M_NOT_FOUND: Unknown user. A sample of the MAS logs when this happens:
2024-10-09T19:19:13.339723Z INFO activity_tracker.flush: mas_handlers::activity_tracker::worker: crates/handlers/src/activity_tracker/worker.rs:219: Flushing 2 activity records to the database
2024-10-09T19:20:13.347047Z INFO activity_tracker.flush: mas_handlers::activity_tracker::worker: crates/handlers/src/activity_tracker/worker.rs:219: Flushing 1 activity records to the database
2024-10-09T19:20:50.876417Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBK4850FS9RTKXBYEX7JD4" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBK4850FS9RTKXBYEX7JD4}:homeserver.create_device{matrix.homeserver="devhack.net" matrix.mxid="@xxxx:devhack.net" matrix.device_id="LorCzfWcDe"}: mas_matrix_synapse: crates/matrix-synapse/src/lib.rs:345: error=Failed to create device in Synapse
Caused by:
request failed with status 404 Not Found: M_NOT_FOUND: Unknown user
2024-10-09T19:20:50.876491Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBK4850FS9RTKXBYEX7JD4" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBK4850FS9RTKXBYEX7JD4}: mas_handlers::compat::login_sso_complete: crates/handlers/src/compat/login_sso_complete.rs:117: error=Internal error: Failed to provision device (Failed to provision device
Caused by:
0: Failed to create device in Synapse
1: request failed with status 404 Not Found: M_NOT_FOUND: Unknown user)
2024-10-09T19:21:13.351558Z INFO activity_tracker.flush: mas_handlers::activity_tracker::worker: crates/handlers/src/activity_tracker/worker.rs:219: Flushing 1 activity records to the database
2024-10-09T19:21:24.690731Z ERROR http.server.request{otel.kind="server" otel.name="POST /upstream/link/:link_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/upstream/link/01J9SBK5JD1WXZP5JMA1XHZM9T" url.scheme="http" http.route="/upstream/link/:link_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.upstream_oauth2.link.post{upstream_oauth_link.id=01J9SBK5JD1WXZP5JMA1XHZM9T}: mas_handlers::upstream_oauth2::link: crates/handlers/src/upstream_oauth2/link.rs:469: error=Missing session cookie
2024-10-09T19:21:27.795058Z ERROR http.server.request{otel.kind="server" otel.name="GET /upstream/link/:link_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="GET" url.path="/upstream/link/01J9SBK5JD1WXZP5JMA1XHZM9T" url.scheme="http" http.route="/upstream/link/:link_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.upstream_oauth2.link.get{upstream_oauth_link.id=01J9SBK5JD1WXZP5JMA1XHZM9T}: mas_handlers::upstream_oauth2::link: crates/handlers/src/upstream_oauth2/link.rs:182: error=Missing session cookie
2024-10-09T19:21:38.304873Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBNDEA00KYXQKPV53857EF" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBNDEA00KYXQKPV53857EF}:homeserver.create_device{matrix.homeserver="devhack.net" matrix.mxid="@xxx:devhack.net" matrix.device_id="hYQWiVuKGg"}: mas_matrix_synapse: crates/matrix-synapse/src/lib.rs:345: error=Failed to create device in Synapse
Caused by:
request failed with status 404 Not Found: M_NOT_FOUND: Unknown user
2024-10-09T19:21:38.304945Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBNDEA00KYXQKPV53857EF" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBNDEA00KYXQKPV53857EF}: mas_handlers::compat::login_sso_complete: crates/handlers/src/compat/login_sso_complete.rs:117: error=Internal error: Failed to provision device (Failed to provision device
Caused by:
0: Failed to create device in Synapse
1: request failed with status 404 Not Found: M_NOT_FOUND: Unknown user)
2024-10-09T19:22:08.904046Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBPEFCDVKEZX7CGPTW0A46" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBPEFCDVKEZX7CGPTW0A46}:homeserver.create_device{matrix.homeserver="devhack.net" matrix.mxid="@xxx:devhack.net" matrix.device_id="CJirStFj8Q"}: mas_matrix_synapse: crates/matrix-synapse/src/lib.rs:345: error=Failed to create device in Synapse
Caused by:
request failed with status 404 Not Found: M_NOT_FOUND: Unknown user
2024-10-09T19:22:08.904114Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBPEFCDVKEZX7CGPTW0A46" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBPEFCDVKEZX7CGPTW0A46}: mas_handlers::compat::login_sso_complete: crates/handlers/src/compat/login_sso_complete.rs:117: error=Internal error: Failed to provision device (Failed to provision device
Caused by:
0: Failed to create device in Synapse
1: request failed with status 404 Not Found: M_NOT_FOUND: Unknown user)
2024-10-09T19:22:13.360865Z INFO activity_tracker.flush: mas_handlers::activity_tracker::worker: crates/handlers/src/activity_tracker/worker.rs:219: Flushing 2 activity records to the database
2024-10-09T19:23:00.007733Z INFO job.run{otel.kind="consumer" job.id=JID-01J9SBR3H0XHFN50P04ZX7P87M job.attempts=0 job.name="cleanup-expired-tokens"}: mas_tasks::database: crates/tasks/src/database.rs:62: cleaned up expired tokens count=1
2024-10-09T19:23:13.368808Z INFO activity_tracker.flush: mas_handlers::activity_tracker::worker: crates/handlers/src/activity_tracker/worker.rs:219: Flushing 1 activity records to the database
To Reproduce
Steps to reproduce the behavior:
Have an account on our Keycloak instance created.
Attempt to authenticate to our matrix server, browser is sent to MAS which sends you to Keycloak
Authenticate to Keycloak, get sent back to MAS.
Get error screen.
Expected behavior
User should be able to login to matrix normally.
Screenshots
Desktop (please complete the following information):
This seems to happen to people on many platforms.
Smartphone (please complete the following information):
This seems to happen to people on many platforms.
Additional context
Restarting the mas process (killing the pod) seems to fix the issue for the particular user. I'm not clear why.
When I asked about this in #matrix-auth:matrix.org the first time it happened, some workarounds were suggested. I tried manage provision-all-users at the time it was believed that the worker just got stuck. That seemed plausible, but it has since happened several times.
Since the first time it happened, we setup Prometheus metrics, which seem to show regularly scheduled cleanup-expired-tokens jobs continuing to complete, suggesting the jobs are not just stuck:
According to the logs, the errors happened at 19:20:50, 19:21:38, 19:22:08 and 19:24:20. I restarted the process shortly after that.
As far as I can tell, this is happening every single time we onboard someone.
I'm not sure how to go about debugging this. I'm happy to create new test users, inspect the database, pull metrics and logs, dump configs, etc as needed.
The text was updated successfully, but these errors were encountered:
We had similar errors, but resolved them by restarting MAS and running provision-all-users.
The whole thing felt a bit buggy for such a central authentication software. Maybe MAS can automatically try to reprovision the user in Synapse if it receives a 404 on device creation?
Describe the bug
On our MAS instance, every time we attempt to onboard a new user, they are prevented from completing login and shown error
request failed with status 404 Not Found: M_NOT_FOUND: Unknown user
. A sample of the MAS logs when this happens:To Reproduce
Steps to reproduce the behavior:
Expected behavior
User should be able to login to matrix normally.
Screenshots
Desktop (please complete the following information):
This seems to happen to people on many platforms.
Smartphone (please complete the following information):
This seems to happen to people on many platforms.
Additional context
#matrix-auth:matrix.org
the first time it happened, some workarounds were suggested. I triedmanage provision-all-users
at the time it was believed that the worker just got stuck. That seemed plausible, but it has since happened several times.cleanup-expired-tokens
jobs continuing to complete, suggesting the jobs are not just stuck:According to the logs, the errors happened at 19:20:50, 19:21:38, 19:22:08 and 19:24:20. I restarted the process shortly after that.
The text was updated successfully, but these errors were encountered: