Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent db error - "server closed the connection unexpectedly" #571

Open
carrollsa opened this issue Sep 5, 2023 · 7 comments
Open

Comments

@carrollsa
Copy link
Collaborator

carrollsa commented Sep 5, 2023

There is an intermittent error being thrown on the call to api/get_last_runs.

paws-compose-server  |   File "/app/./api/admin_api.py", line 241, in get_run_logs
paws-compose-server  |     result = connection.execute(q)
...
paws-compose-server  | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpected
paws-compose-server  |  This probably means the server terminated abnormally
paws-compose-server  |  before or while processing the request.
paws-compose-server  |
paws-compose-server  | [SQL: select keycol,valcol from kv_unique where keycol like '%%_update'; ]
paws-compose-server  | (Background on this error at: http://sqlalche.me/e/14/e3q8)
  • It seems to only happen when the container has been recently started
  • Most likely due to something not having finished spinning up with the database, as I can't think of another reason a valid SQL query with no params or anything would fail like this.
  • Manually testing a few times, the only way I could reproduce the error is to start the container and quickly go to the page where this endpoint is being hit, and the call has consistently succeeded when I've waited a minute or so after the container starts.
  • A page refresh has fixed the issue in every instance I've encountered

@c-simpson Does the above sound plausible? If so, I don't think this is much of an issue, but I wanted to track it.

@c-simpson
Copy link
Collaborator

We sleep the server at startup so the DB has time to be up before we touch it. Do you see this in your server logs?
The "X already present" lines show successful selects against the DB.

------------STARTING Wed Sep  6 00:51:17 UTC 2023 ------------------
SLEEPING.. WAITING FOR DB
WAKING
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
d80cb6df0fa2 (head)
DB SETUP
         ...
{"event": "3 roles already present in DB, not creating", "level": "debug", "timestamp": 1693961485.3440847, "filename": "base_users.py", "func_name": "create_base_roles", "lineno": 46} 
{"event": "4 users already present in DB, not creating", "level": "debug", "timestamp": 1693961485.3868604, "filename": "base_users.py", "func_name": "create_base_users", "lineno": 91} 
{"event": "5 event types already present in DB, not creating", "level": "debug", "timestamp": 1693961485.3897622, "filename": "base_users.py", "func_name": "populate_sl_event_types", "lineno": 156}

@carrollsa
Copy link
Collaborator Author

Alright, it appears the issue is worth keeping around, at least, but it seems pretty low priority given that a refresh is enough to fix it and that it won't be used much.

@carrollsa
Copy link
Collaborator Author

carrollsa commented Sep 13, 2023

We discovered a few things last night:

  • This issue is not only present at startup and can appear intermittently even after the container has been running for a few days.
  • This issue is not limited to the query made from the get_last_runs endpoint.

Currently testing a temporary solution of adding pool_pre_ping=True as an arg to sqlalchemy.create_engine().

Relevant logs:

....container startup logs

paws-compose-db      | 2023-09-12 23:50:17.730 UTC [32] LOG:  statement: BEGIN
paws-compose-db      | 2023-09-12 23:50:17.730 UTC [32] LOG:  statement: select valcol from kv_unique where keycol = 'last_execution_time';
paws-compose-db      | 2023-09-12 23:50:17.733 UTC [32] LOG:  statement: ROLLBACK
paws-compose-server  | [pid: 18|app: 0|req: 2/2] 172.30.0.4 () {54 vars in 1748 bytes} [Tue Sep 12 23:50:17 2023] GET /api/statistics => generated 2 bytes in 10 msecs (HTTP/1.0 200) 3 headers in 110 bytes (1 switches on core 1)
paws-compose-client  | 172.30.0.1 - - [12/Sep/2023:23:50:17 +0000] "GET /api/statistics HTTP/1.1" 200 2 "http://localhost/admin" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36" "-"
paws-compose-client  | 172.30.0.1 - - [12/Sep/2023:23:50:18 +0000] "GET /api/admin/user/get_users HTTP/1.1" 502 559 "http://localhost/users" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36" "-"
paws-compose-server  | Traceback (most recent call last):
paws-compose-client  | 2023/09/12 23:50:18 [error] 36#36: *1 upstream prematurely closed connection while reading response header from upstream, client: 172.30.0.1, server: localhost, request: "GET /api/admin/user/get_users HTTP/1.1", upstream: "http://172.30.0.3:5000/api/admin/user/get_users", host: "localhost", referrer: "http://localhost/users"
paws-compose-server  |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1770, in _execute_context
paws-compose-server  |     self.dialect.do_execute(
paws-compose-server  |   File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 717, in do_execute
paws-compose-server  |     cursor.execute(statement, parameters)
paws-compose-server  | psycopg2.OperationalError: server closed the connection unexpectedly
paws-compose-server  |  This probably means the server terminated abnormally
paws-compose-server  |  before or while processing the request.
paws-compose-server  |
paws-compose-server  |
paws-compose-server  | The above exception was the direct cause of the following exception:
paws-compose-server  |
paws-compose-server  | Traceback (most recent call last):
paws-compose-server  |   File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2464, in __call__

...Flask call stack

/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 717, in do_execute
paws-compose-server  |     cursor.execute(statement, parameters)
paws-compose-server  | sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) server closed the connection unexpectedly
paws-compose-server  |  This probably means the server terminated abnormally
paws-compose-server  |  before or while processing the request.

# DB startup queries begin again here
paws-compose-server  | [SQL:  select username, full_name, active, pr.role
paws-compose-server  |             from pdp_users as pu
paws-compose-server  |             left join pdp_user_roles as pr on pu.role = pr._id
paws-compose-server  |             order by username ]
paws-compose-server  | (Background on this error at: http://sqlalche.me/e/14/e3q8)
paws-compose-server  | [pid: 16|app: 0|req: 1/3] 172.30.0.4 () {54 vars in 1768 bytes} [Tue Sep 12 23:50:18 2023] GET /api/admin/user/get_users => generated 0 bytes in 13 msecs (HTTP/1.0 500) 0 headers in 0 bytes (0 switches on core 0)
paws-compose-db      | 2023-09-12 23:54:07.864 UTC [32] LOG:  statement: BEGIN
paws-compose-db      | 2023-09-12 23:54:07.864 UTC [32] LOG:  statement: select active from pdp_users where username='base_admin'
paws-compose-db      | 2023-09-12 23:54:07.873 UTC [41] LOG:  statement: BEGIN
...

Not sure if the truncated version above is too helpful. All of the logs from the point of starting the container are here.

@c-simpson
Copy link
Collaborator

@carrollsa Close this one given #573?

@carrollsa
Copy link
Collaborator Author

@carrollsa Close this one given #573?
Sounds good! Haven't encountered the error locally since upgrading.

@carrollsa carrollsa changed the title Error thrown on call to api/get_last_runs Intermittent db error - "server closed the connection unexpectedly" Jun 18, 2024
@carrollsa
Copy link
Collaborator Author

Reopening this issue as the error is still present.

@carrollsa carrollsa reopened this Jun 18, 2024
@carrollsa
Copy link
Collaborator Author

I found this stackoverflow post referencing our problem, but it is using some older versions of what we're using. I think we can do the same thing just by adding pool_pre_ping=True to our create_engine call in line 70 of config.py.

engine = db.create_engine(DB, pool_pre_ping=True)

I haven't been able to reproduce the error at all today without this addition, but it feels like it may be what we need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants