-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the SqliteDosStorage
storage backend
#6148
Conversation
@sphuber this is awesome 👍. It would also be perfect if it was possible to use this for the test suite. Perhaps there could be some pytest switch that would allow changing the storage backend? It is also currently a bit annoying that the backend is loaded unconditionally, even if you run just a single test file that does not require it. Again, a pytest switch that disables the session fixture that loads the backend would be awesome (and I suspect would speed up iteration during development). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, great! I left a small comment for a documentation string (maybe you can check if there are other similar cases in the code).
I fully support Daniel's comment - maybe open an issue and we look into it once this is merged?
I also opened aiidateam/aiida-tutorials#468 for the future, when we'll prepare the next tutorial
Finally, one important note. In the previous coding week, we've been discussing having a documentation page for the various backends, and have it properly linked in the docs so people see it when they setup AiiDA. And this was combined with a simpler verdi profile setup mechanism. I think @superstar54 , @unkcpz and/or @eimrek might have been working on this? Has this ever been merged? I recommend adding the documentation for these new backend as well. Either we open an issue now and plan to fix it soon, or if that documentation is already in the docs, we could fix it directly in this PR
aiida/manage/configuration/config.py
Outdated
daemon__recursion_limit: int = Field(3000, description='Maximum recursion depth for the daemon workers.') | ||
db__batch_size: int = Field( | ||
100000, | ||
description='Batch size for bulk CREATE operations in the database. Avoids hitting MaxAllocSize of PostgreSQL ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall this still be speaking explicitly of PostgreSQL? Maybe OK to keep but add "E.g., to avoid hitting..."
Thanks for the review @giovannipizzi
Regarding switching off the automatic loading of the backend in the session fixture: we had discussed this in the past, but decided against it since it is used by plugins and so changing it would be breaking, as they would have to start adding it manually. I agree that having it off by default would still be better, so maybe we can revisit whether this is an acceptable breaking change for plugins. As for changing the backend plugin: I already have the required scaffolding in place in the
I was not aware of this effort but agree that this would be very useful. This ties neatly into why we cannot merge this PR just yet. There is still some functionality that is not supported by this backend, mostly related to the querybuilder:
Since this PR was anyway dependent on a bunch of other PRs being merged first, I had put these final details on hold. But will try to find time soon to work on them. Update:
It seems the part of JSON query functionality that is not supported includes at least the |
2155e14
to
aaadf59
Compare
@giovannipizzi I missed this comment, sorry. |
The PR of @superstar54 uses the |
|
026a253
to
758e00d
Compare
The implementation subclasses the `PsqlDosBackend` and replaces the PostgreSQL database with an sqlite database. By doing so, the initialization of the storage only requires a directory on the local file system where it will create the sqlite file for the database and a container for the disk-objectstore. The advantage of this `sqlite_dos` storage over the default `psql_dos` is that it doesn't require a system service like PostgreSQL. As a result, creating the storage is very straightforward and can be done with almost no setup. The advantage over the existing `sqlite_zip` is that the `sqlite_dos` is not read-only but can be used to write data as well. Combined with the `verdi profile setup` command, a working profile can be created with a single command: verdi profile setup core.sqlite_dos -n --profile name --email e@mail This makes this storage backend very useful for tutorials and demos that don't rely on performance.
The `store` method of the `SqliteEntityOverride` class, used by the `SqliteZipBackend` storage backend (and with that all other backends to subclass this), did not return `self`. This is in conflict with the signature of the base class that it is overriding. Since the `SqliteZipBackend` is read-only and so `store` would never be called, this problem went unnoticed. However, with the addition of the `SqliteDosStorage` backend which is *not* read-only, this bug would surface when trying to store a node since certain methods rely on this method returning the node instance itself.
The storage backends that use sqlite instead of PostgreSQL, i.e., `core.sqlite_dos`, `core.sqlite_temp` and `core.sqlite_zip`, piggy back of the ORM models defined by the `core.psql_dos` backend by dynamically converting to the sqlite equivalent database models. The current implementation of `SqlaGroup.count` would except when used with an sqlite backend since certain columns would be ambiguously defined: sqlite3.OperationalError: ambiguous column name: db_dbgroup.id This is fixed by explicitly wrapping the classes that are joined in `sqlalchemy.orm.aliased` which will force sqlalchemy to properly alias each class removing the ambiguity.
758e00d
to
8087995
Compare
The implementation subclasses the
PsqlDosBackend
and replaces thePostgreSQL database with an sqlite database. By doing so, the
initialization of the storage only requires a directory on the local
file system where it will create the sqlite file for the database and a
container for the disk-objectstore.
The advantage of this
sqlite_dos
storage over the defaultpsql_dos
is that it doesn't require a system service like PostgreSQL. As a
result, creating the storage is very straightforward and can be done
with almost no setup. The advantage over the existing
sqlite_zip
isthat the
sqlite_dos
is not read-only but can be used to write dataas well.
Combined with the
verdi profile setup
command, a working profile canbe created with a single command:
This makes this storage backend very useful for tutorials and demos
that don't rely on performance.