diff --git a/doc/source/dev/database_session_management.md b/doc/source/dev/database_session_management.md new file mode 100644 index 000000000000..80ae4ec4489a --- /dev/null +++ b/doc/source/dev/database_session_management.md @@ -0,0 +1,32 @@ +# How SQLAlchemy sessions are handled in Galaxy + +## Access to the Session object registry + +We have a central resource from which to grab the Session object registry [see sessionmaker in the SQLAlchemy documentation](https://docs.sqlalchemy.org/en/20/orm/session_api.html#sqlalchemy.orm.sessionmaker), that is `app.model.session`. +You can either use it directly as a proxy to the current session or call `app.model.session()` to get the proxied Session object, or use the context manager (e.g. `with app.model.session() as session:`), as detailed in the SQLAlchemy documentation. + +## Session object registry scopes + +The default scope of the Session object registry is [thread local](https://docs.sqlalchemy.org/en/20/orm/contextual.html#thread-local-scope). This means that accessing a Session through the Session object registry will return the same Session in the same thread. This default scope can be changed if necessary. + +### WSGI, ASGI, Celery + +In the context of the web app (WSGI and ASGI) there is a clearly delineated lifespan for a session, and that lifespan isn't tied to a thread. The scope of the Session object registry is set at the beginning of a web request using `app.model.set_request_id(request_id)`. At the end of a web request we remove the scope using `app.model.unset_request_id(request_id)`. This also closes and removes any active Session object. Similarly, we set the scope for each Celery task execution using `set_request_id` and `unset_request_id`. + +By using this mechanism we are certain that web requests and celery tasks receive a new Session object and that any resources held by the session are released. + +### Job and Workflow handlers + +Job and Workflow handlers perform multiple pieces of business logic in separate threads and care needs to be taken when deciding when to start a transaction, when to do a rollback and if and where a session lifespan can be introduced. + +The following paragraph discusses the choices made in : + +The `JobHandlerQueue.monitor` method is executed as a thread within the job handler process, +and all work that requires access to the database is happening inside `check_watched_items`, +which iterates over jobs assigned to this handler process. By setting the scope to a new, +random uuid around `check_watched_items` we ensure that even if the session becomes corrupt +the session is properly discarded and the next iteration of `check_watched_items` receives +a new, clean session to work with. As the work within `JobHandlerQueue.monitor` is happening within a single thread it is not technically necessary to set a custom scope, and one could simply call `app.model.session.close`; however, there is very little cost associated to setting a custom scope and it becomes very clear +where the session lifespan starts and ends. + +As a guiding principle an attempt should be made to manage the session state and database-related exception handling as high up in the calling stack as possible. diff --git a/doc/source/dev/index.rst b/doc/source/dev/index.rst index acf36f427406..7d05a9d72ddf 100644 --- a/doc/source/dev/index.rst +++ b/doc/source/dev/index.rst @@ -13,6 +13,7 @@ A multi-hour long video playlist covering these slides can be found at schema api_guidelines + database_session_management build_a_job_runner finding_and_improving_slow_code data_managers