Skip to content

Commit

Permalink
Remove rollback from __check_jobs_at_startup
Browse files Browse the repository at this point in the history
I think the premise that we want to do a rollback on exceptions in this
method is wrong (it **may** be correct apprach in other places in the
codebase e.g. in
`Tool.handle_single_execution()`). Here it prevents us from comitting
anything inside the with statement (as the job_wrapper.fail method
does).
Here's the simplified issue:

```shell
❯ python -i scripts/db_shell.py -c config/galaxy.yml
>>> with sa_session() as session, session.begin():
...      sa_session.execute(update(Job).where(Job.id == 1).values(state="error"))
...      sa_session.commit()
...      sa_session.execute(update(Job).where(Job.id == 1).values(state="ok"))
...      sa_session.commit()
...
<sqlalchemy.engine.cursor.LegacyCursorResult object at 0x11f1be350>
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "<string>", line 2, in execute
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1711, in execute
    conn = self._connection_for_bind(bind, close_with_result=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1552, in _connection_for_bind
    TransactionalContext._trans_ctx_check(self)
  File "/Users/mvandenb/src/galaxy/.venv/lib/python3.11/site-packages/sqlalchemy/engine/util.py", line 199, in _trans_ctx_check
    raise exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: Can't operate on closed transaction inside context manager.  Please complete the context manager before emitting further commands.
```

It is probably still worthwhile to have the job recovery be minimal and
do things such as calling the job wrapper fail method that does actual
work to the job handler as in
galaxyproject#17083, but that's
refactoring that can be done on the dev branch and it still seems risky
in the sense that we then need to be very careful in ensuring we don't
commit anywhere else inside the scope of the begin() statement.

Finally I don't think it makes sense that the startup check should
ever cause the boot process to fail. This isn't a misconfiguration
or even anything catastrophic for the remaining jobs and places
unnecessary stress on admins and can basically break at any time
and shouldn't cause a complete service failure.

Fixes galaxyproject#17079
  • Loading branch information
mvdbeek committed Nov 26, 2023
1 parent f450dee commit 57e8fe6
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions lib/galaxy/jobs/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,13 +273,13 @@ def __check_jobs_at_startup(self):
In case the activation is enforced it will filter out the jobs of inactive users.
"""
stmt = self._build_check_jobs_at_startup_statement()
with self.sa_session() as session, session.begin():
try:
for job in session.scalars(stmt):
with session.begin_nested():
self._check_job_at_startup(job)
finally:
session.commit()
with self.sa_session() as session:
for job in session.scalars(stmt):
try:
self._check_job_at_startup(job)
except Exception:
log.exception("Error recover while recovering job %s during application startup.", job.id)
session.commit()

def _check_job_at_startup(self, job):
if not self.app.toolbox.has_tool(job.tool_id, job.tool_version, exact=True):
Expand Down

0 comments on commit 57e8fe6

Please sign in to comment.