Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.0] Fix deadlock that can occur when changing job state #17896

Merged
merged 3 commits into from
Apr 3, 2024

Conversation

mvdbeek
Copy link
Member

@mvdbeek mvdbeek commented Apr 3, 2024

This is a deadlock that occurred on main:

DeadlockDetected: deadlock detected
DETAIL:  Process 116772 waits for ShareLock on transaction 5465347; blocked by process 3511756.
Process 3511756 waits for ShareLock on transaction 5465546; blocked by process 116772.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (3575330,77) in relation "job"

  File "sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
OperationalError: (psycopg2.errors.DeadlockDetected) deadlock detected
DETAIL:  Process 116772 waits for ShareLock on transaction 5465347; blocked by process 3511756.
Process 3511756 waits for ShareLock on transaction 5465546; blocked by process 116772.
HINT:  See server log for query details.
CONTEXT:  while updating tuple (3575330,77) in relation "job"

[SQL: UPDATE job SET update_time=%(update_time)s, state=%(state)s WHERE job.id = %(id_1)s AND (job.state NOT IN (%(state_1_1)s, %(state_1_2)s))]
[parameters: {'update_time': datetime.datetime(2024, 4, 2, 14, 58, 14, 475927), 'state': <JobState.PAUSED: 'paused'>, 'id_1': 56817470, 'state_1_1': <JobState.DELETING: 'deleting'>, 'state_1_2': <JobState.DELETED: 'deleted'>}]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
  File "galaxy/jobs/handler.py", line 563, in __handle_waiting_jobs
    job.set_state(model.Job.states.PAUSED)
  File "galaxy/model/__init__.py", line 1695, in set_state
    rval = session.execute(
  File "sqlalchemy/orm/session.py", line 1717, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
  File "sqlalchemy/engine/base.py", line 1710, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "sqlalchemy/sql/elements.py", line 334, in _execute_on_connection
    return connection._execute_clauseelement(
  File "sqlalchemy/engine/base.py", line 1577, in _execute_clauseelement
    ret = self._execute_context(
  File "sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)

the other side also set the state to paused. We can reduce the probability of that happening by shortening the window during which the transaction is open. Also excludes more states in the where clause.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@mvdbeek mvdbeek added kind/bug area/database Galaxy's database or data access layer labels Apr 3, 2024
lib/galaxy/model/__init__.py Outdated Show resolved Hide resolved
lib/galaxy/model/__init__.py Outdated Show resolved Hide resolved
Co-authored-by: John Davis <[email protected]>
Copy link
Member

@jdavcs jdavcs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 minor edit suggestions for inline comments, the rest looks good. Thanks!

@github-actions github-actions bot added this to the 24.1 milestone Apr 3, 2024
@martenson martenson merged commit 4827ff8 into galaxyproject:release_24.0 Apr 3, 2024
47 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database Galaxy's database or data access layer kind/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants