Possible back end issues with new unified login? #3726

misaugstad · 2024-11-05T02:44:50Z

Brief description of problem/feature

I know that @yeisenberg was trying to look at the Mendota admin page earlier today. He said that he got it to load once, surprisingly! But that it failed later. It has always failed for me since unified login, timing out at 60 seconds before it finishes loading.

But then I noticed that the Mendota server restarted itself this morning as well. My guess it was while Yochai or I were trying to access the admin page. Here are some of the errors that I see in the logs (pasted only the part of the stack trace that includes references to our code):

2024-11-04 11:18:24,426 - [ERROR] - from application in play-akka.actor.default-dispatcher-2480
Internal server error, for (HEAD) [/signIn] ->

play.api.Application$$anon$1: Execution exception[[SQLException: Timed out waiting for a free available connection.]]
...
        at scala.slick.jdbc.PlayDatabase.withTransaction(PlayDatabase.scala:6) ~[com.typesafe.play.play-slick_2.10-0.8.1.jar:0.8.1]
        at models.daos.slick.DBTableDefinitions$UserTable$.find(DBTableDefinitions.scala:66) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.7.jar:na]
        at controllers.UserController.logPageVisit(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$signIn$1.apply(UserController.scala:35) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$signIn$1.apply(UserController.scala:33) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
...

And then this:

2024-11-04 11:23:43,151 - [ERROR] - from play.nettyException in New I/O worker #46
Exception caught in Netty
java.lang.OutOfMemoryError: GC overhead limit exceeded
...
        at scala.slick.jdbc.PlayDatabase.withTransaction(PlayDatabase.scala:6) ~[com.typesafe.play.play-slick_2.10-0.8.1.jar:0.8.1]
        at models.daos.slick.DBTableDefinitions$UserTable$.find(DBTableDefinitions.scala:66) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.7.jar:na]
        at controllers.UserController.logPageVisit(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
...

Potential solution(s)

Since the authentication info is now centralized, I wonder if we're having problems with concurrent reads/writes. One thing we might want to do is to check for any places where we're using withTransaction instead of withSession where we don't need to be in the authentication code. I assume that this essentially takes out a lock on a table in order to make updates to it. But there's rarely a case where we need to use withTransaction if we're just doing a read. And I specifically see scala.slick.jdbc.PlayDatabase.withTransaction mentioned in the code above.

I could also spend some time looking into what various database connections are doing during sign in and when loading the admin page. More investigation needed!

The text was updated successfully, but these errors were encountered:

jonfroehlich · 2024-11-05T04:07:32Z

Is this only affecting the admin page loading or will normal users be affected too? Sent from phone

…

On Mon, Nov 4, 2024 at 6:45 PM Michael Saugstad ***@***.***> wrote: Brief description of problem/feature I know that @yeisenberg <https://github.com/yeisenberg> was trying to look at the Mendota admin page earlier today. He said that he got it to load once, surprisingly! But that it failed later. It has always failed for me since unified login, timing out at 60 seconds before it finishes loading. But then I noticed that the Mendota server restarted itself this morning as well. My guess it was while Yochai or I were trying to access the admin page. Here are some of the errors that I see in the logs (pasted only the part of the stack trace that includes references to our code): 2024-11-04 11:18:24,426 - [ERROR] - from application in play-akka.actor.default-dispatcher-2480 Internal server error, for (HEAD) [/signIn] -> play.api.Application$$anon$1: Execution exception[[SQLException: Timed out waiting for a free available connection.]] ... at scala.slick.jdbc.PlayDatabase.withTransaction(PlayDatabase.scala:6) ~[com.typesafe.play.play-slick_2.10-0.8.1.jar:0.8.1] at models.daos.slick.DBTableDefinitions$UserTable$.find(DBTableDefinitions.scala:66) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0] at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0] at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0] at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.7.jar:na] at controllers.UserController.logPageVisit(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0] at controllers.UserController$$anonfun$signIn$1.apply(UserController.scala:35) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0] at controllers.UserController$$anonfun$signIn$1.apply(UserController.scala:33) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0] ... And then this: 2024-11-04 11:23:43,151 - [ERROR] - from play.nettyException in New I/O worker #46 Exception caught in Netty java.lang.OutOfMemoryError: GC overhead limit exceeded ... at scala.slick.jdbc.PlayDatabase.withTransaction(PlayDatabase.scala:6) ~[com.typesafe.play.play-slick_2.10-0.8.1.jar:0.8.1] at models.daos.slick.DBTableDefinitions$UserTable$.find(DBTableDefinitions.scala:66) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0] at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0] at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0] at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.7.jar:na] at controllers.UserController.logPageVisit(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0] ... Potential solution(s) Since the authentication info is now centralized, I wonder if we're having problems with concurrent reads/writes. One thing we might want to do is to check for any places where we're using withTransaction instead of withSession where we don't *need* to be in the authentication code. I assume that this essentially takes out a lock on a table in order to make updates to it. But there's rarely a case where we need to use withTransaction if we're just doing a read. And I specifically see scala.slick.jdbc.PlayDatabase.withTransaction mentioned in the code above. I could also spend some time looking into what various database connections are doing during sign in and when loading the admin page. More investigation needed! — Reply to this email directly, view it on GitHub <#3726>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAML55JRM5ZDDXLDEGVV2RLZ7AWLPAVCNFSM6AAAAABRFRP4VCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZTIMRXHAZDMNI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

misaugstad · 2024-11-05T04:24:57Z

I really don't have much information, and I don't even know for sure that it's related to the admin page. I just know that Yochai and I tried to load the admin page on a server within a couple hours of when that server restarted (I only know it restarted because I got an email notification). I don't know if there was any specific negative effect on anyone, nor do I know what triggered it right now!

Just adding what little information I have for now, and as I get reports of problems over the next couple weeks, I'll continue to document here!

jonfroehlich · 2024-11-05T16:50:04Z

Gotcha. Thanks Mikey.

misaugstad · 2024-11-18T19:47:53Z

Started to make some attempts at fixes in #3741 and #3737

I'm seeing a lot of errors where the server can't find a free available connection, while our connection pool only has ~100 open connections (with our max set to 200 I believe). One thought I've had is to try to increase the max number of connections per city. We've messed with the min number in the past (#3316) so that we don't have cities where nothing is happening hogging idle connections. But maybe we should raise the cap on connections for cities when a lot of activity is happening? It's possible that we're having issues when trying to run clustering while trying to load the Admin page at the same time, for example. Documentation below, if we continue to run into problems then I'll look through all of these settings and try out some tweaks to see if we can make any headway.
https://www.playframework.com/documentation/2.3.x/SettingsJDBC#Configuring-the-JDBC-pool

misaugstad self-assigned this Nov 9, 2024

misaugstad added this to Mikey Task Board Nov 9, 2024

misaugstad mentioned this issue Nov 18, 2024

Small query performance improvements #3741

Merged

2 tasks

misaugstad mentioned this issue Nov 23, 2024

v8.0.2 #3753

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible back end issues with new unified login? #3726

Possible back end issues with new unified login? #3726

misaugstad commented Nov 5, 2024

jonfroehlich commented Nov 5, 2024 via email

misaugstad commented Nov 5, 2024

jonfroehlich commented Nov 5, 2024

misaugstad commented Nov 18, 2024

Possible back end issues with new unified login? #3726

Possible back end issues with new unified login? #3726

Comments

misaugstad commented Nov 5, 2024

Brief description of problem/feature

Potential solution(s)

jonfroehlich commented Nov 5, 2024 via email

misaugstad commented Nov 5, 2024

jonfroehlich commented Nov 5, 2024

misaugstad commented Nov 18, 2024