-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible back end issues with new unified login? #3726
Comments
Is this only affecting the admin page loading or will normal users be
affected too?
Sent from phone
…On Mon, Nov 4, 2024 at 6:45 PM Michael Saugstad ***@***.***> wrote:
Brief description of problem/feature
I know that @yeisenberg <https://github.com/yeisenberg> was trying to
look at the Mendota admin page earlier today. He said that he got it to
load once, surprisingly! But that it failed later. It has always failed for
me since unified login, timing out at 60 seconds before it finishes loading.
But then I noticed that the Mendota server restarted itself this morning
as well. My guess it was while Yochai or I were trying to access the admin
page. Here are some of the errors that I see in the logs (pasted only the
part of the stack trace that includes references to our code):
2024-11-04 11:18:24,426 - [ERROR] - from application in play-akka.actor.default-dispatcher-2480
Internal server error, for (HEAD) [/signIn] ->
play.api.Application$$anon$1: Execution exception[[SQLException: Timed out waiting for a free available connection.]]
...
at scala.slick.jdbc.PlayDatabase.withTransaction(PlayDatabase.scala:6) ~[com.typesafe.play.play-slick_2.10-0.8.1.jar:0.8.1]
at models.daos.slick.DBTableDefinitions$UserTable$.find(DBTableDefinitions.scala:66) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.7.jar:na]
at controllers.UserController.logPageVisit(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
at controllers.UserController$$anonfun$signIn$1.apply(UserController.scala:35) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
at controllers.UserController$$anonfun$signIn$1.apply(UserController.scala:33) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
...
And then this:
2024-11-04 11:23:43,151 - [ERROR] - from play.nettyException in New I/O worker #46
Exception caught in Netty
java.lang.OutOfMemoryError: GC overhead limit exceeded
...
at scala.slick.jdbc.PlayDatabase.withTransaction(PlayDatabase.scala:6) ~[com.typesafe.play.play-slick_2.10-0.8.1.jar:0.8.1]
at models.daos.slick.DBTableDefinitions$UserTable$.find(DBTableDefinitions.scala:66) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
at controllers.UserController$$anonfun$2.apply(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
at scala.Option.getOrElse(Option.scala:120) ~[org.scala-lang.scala-library-2.10.7.jar:na]
at controllers.UserController.logPageVisit(UserController.scala:117) ~[sidewalk-webpage.sidewalk-webpage-8.0.0.jar:8.0.0]
...
Potential solution(s)
Since the authentication info is now centralized, I wonder if we're having
problems with concurrent reads/writes. One thing we might want to do is to
check for any places where we're using withTransaction instead of
withSession where we don't *need* to be in the authentication code. I
assume that this essentially takes out a lock on a table in order to make
updates to it. But there's rarely a case where we need to use
withTransaction if we're just doing a read. And I specifically see
scala.slick.jdbc.PlayDatabase.withTransaction mentioned in the code above.
I could also spend some time looking into what various database
connections are doing during sign in and when loading the admin page. More
investigation needed!
—
Reply to this email directly, view it on GitHub
<#3726>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAML55JRM5ZDDXLDEGVV2RLZ7AWLPAVCNFSM6AAAAABRFRP4VCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGYZTIMRXHAZDMNI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I really don't have much information, and I don't even know for sure that it's related to the admin page. I just know that Yochai and I tried to load the admin page on a server within a couple hours of when that server restarted (I only know it restarted because I got an email notification). I don't know if there was any specific negative effect on anyone, nor do I know what triggered it right now! Just adding what little information I have for now, and as I get reports of problems over the next couple weeks, I'll continue to document here! |
Gotcha. Thanks Mikey. |
Started to make some attempts at fixes in #3741 and #3737 I'm seeing a lot of errors where the server can't find a free available connection, while our connection pool only has ~100 open connections (with our max set to 200 I believe). One thought I've had is to try to increase the max number of connections per city. We've messed with the min number in the past (#3316) so that we don't have cities where nothing is happening hogging idle connections. But maybe we should raise the cap on connections for cities when a lot of activity is happening? It's possible that we're having issues when trying to run clustering while trying to load the Admin page at the same time, for example. Documentation below, if we continue to run into problems then I'll look through all of these settings and try out some tweaks to see if we can make any headway. |
Brief description of problem/feature
I know that @yeisenberg was trying to look at the Mendota admin page earlier today. He said that he got it to load once, surprisingly! But that it failed later. It has always failed for me since unified login, timing out at 60 seconds before it finishes loading.
But then I noticed that the Mendota server restarted itself this morning as well. My guess it was while Yochai or I were trying to access the admin page. Here are some of the errors that I see in the logs (pasted only the part of the stack trace that includes references to our code):
And then this:
Potential solution(s)
Since the authentication info is now centralized, I wonder if we're having problems with concurrent reads/writes. One thing we might want to do is to check for any places where we're using
withTransaction
instead ofwithSession
where we don't need to be in the authentication code. I assume that this essentially takes out a lock on a table in order to make updates to it. But there's rarely a case where we need to usewithTransaction
if we're just doing a read. And I specifically seescala.slick.jdbc.PlayDatabase.withTransaction
mentioned in the code above.I could also spend some time looking into what various database connections are doing during sign in and when loading the admin page. More investigation needed!
The text was updated successfully, but these errors were encountered: