Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible back end issues with new unified login? #3726

Open
misaugstad opened this issue Nov 5, 2024 · 4 comments
Open

Possible back end issues with new unified login? #3726

misaugstad opened this issue Nov 5, 2024 · 4 comments
Assignees

Comments

@misaugstad
Copy link
Member

Brief description of problem/feature

I know that @yeisenberg was trying to look at the Mendota admin page earlier today. He said that he got it to load once, surprisingly! But that it failed later. It has always failed for me since unified login, timing out at 60 seconds before it finishes loading.

But then I noticed that the Mendota server restarted itself this morning as well. My guess it was while Yochai or I were trying to access the admin page. Here are some of the errors that I see in the logs (pasted only the part of the stack trace that includes references to our code):

2024-11-04 11:18:24,426 - [ERROR] - from application in play-akka.actor.default-dispatcher-2480
Internal server error, for (HEAD) [/signIn] ->

play.api.Application$$anon$1: Execution exception[[SQLException: Timed out waiting for a free available connection.]]
...
        at scala.slick.jdbc.PlayDatabase.withTransaction(PlayDatabase.scala:6) ~[com.typesafe.play.play-slick_2.10-0.8.1.jar:0.8.1]
        at models.daos.slick.DBTableDefinitions$UserTable$.find(DBTableDefinitions.scala:66) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.7.jar:na]
        at controllers.UserController.logPageVisit(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$signIn$1.apply(UserController.scala:35) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$signIn$1.apply(UserController.scala:33) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
...

And then this:

2024-11-04 11:23:43,151 - [ERROR] - from play.nettyException in New I/O worker #46
Exception caught in Netty
java.lang.OutOfMemoryError: GC overhead limit exceeded
...
        at scala.slick.jdbc.PlayDatabase.withTransaction(PlayDatabase.scala:6) ~[com.typesafe.play.play-slick_2.10-0.8.1.jar:0.8.1]
        at models.daos.slick.DBTableDefinitions$UserTable$.find(DBTableDefinitions.scala:66) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
        at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.7.jar:na]
        at controllers.UserController.logPageVisit(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
...
Potential solution(s)

Since the authentication info is now centralized, I wonder if we're having problems with concurrent reads/writes. One thing we might want to do is to check for any places where we're using withTransaction instead of withSession where we don't need to be in the authentication code. I assume that this essentially takes out a lock on a table in order to make updates to it. But there's rarely a case where we need to use withTransaction if we're just doing a read. And I specifically see scala.slick.jdbc.PlayDatabase.withTransaction mentioned in the code above.

I could also spend some time looking into what various database connections are doing during sign in and when loading the admin page. More investigation needed!

@jonfroehlich
Copy link
Member

jonfroehlich commented Nov 5, 2024 via email

@misaugstad
Copy link
Member Author

I really don't have much information, and I don't even know for sure that it's related to the admin page. I just know that Yochai and I tried to load the admin page on a server within a couple hours of when that server restarted (I only know it restarted because I got an email notification). I don't know if there was any specific negative effect on anyone, nor do I know what triggered it right now!

Just adding what little information I have for now, and as I get reports of problems over the next couple weeks, I'll continue to document here!

@jonfroehlich
Copy link
Member

Gotcha. Thanks Mikey.

@misaugstad
Copy link
Member Author

Started to make some attempts at fixes in #3741 and #3737

I'm seeing a lot of errors where the server can't find a free available connection, while our connection pool only has ~100 open connections (with our max set to 200 I believe). One thought I've had is to try to increase the max number of connections per city. We've messed with the min number in the past (#3316) so that we don't have cities where nothing is happening hogging idle connections. But maybe we should raise the cap on connections for cities when a lot of activity is happening? It's possible that we're having issues when trying to run clustering while trying to load the Admin page at the same time, for example. Documentation below, if we continue to run into problems then I'll look through all of these settings and try out some tweaks to see if we can make any headway.
https://www.playframework.com/documentation/2.3.x/SettingsJDBC#Configuring-the-JDBC-pool

@misaugstad misaugstad mentioned this issue Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: in progress
Development

No branches or pull requests

2 participants