-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Descriptor calculation in python #510
Merged
RandomDefaultUser
merged 51 commits into
mala-project:develop
from
RandomDefaultUser:descriptors_python
Apr 5, 2024
Merged
Descriptor calculation in python #510
RandomDefaultUser
merged 51 commits into
mala-project:develop
from
RandomDefaultUser:descriptors_python
Apr 5, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR
This PR implements the calculation of bispectrum and Gaussian descriptors in python. This implementation is significantly slower than LAMMPS and has some other restrictions, but it is fast and general enough to be used as a fallback during development and CI testing. Its intended audience are developers and students who did not set up LAMMPS yet or the CI.
Goals
The idea of having MALA do bispectrum calculations independently of other codes had been flowing around for a while. One of the main issues with being dependent on external libraries, from my point of view, is getting started with the code. Multiple times I have interacted with students who just wanted to try out some small part of the code, but did not want to set up LAMMPS and QE themselves. Therefore this PR and a similar one for the QE part, which will follow, implement the basic functionality of the external libraries. This python implementation is NOT meant to replace LAMMPS in production - it is, depending on the size of the system, orders of magnitude slower. It is merely a fallback option. An actual alternative to LAMMPS may be developed from this code base, given enough time.
New features
Current limitations
Note on optimization
The python implementation of the descriptors was developed by first copying the code directly and then optimizing it just enough to be usable. LAMMPS is written in C++, and as such, many for-loops which are optimized by the compiler can be employed. This is drastically inefficient in python, so functions were rewritten to use optimized vector-operations (e.g. via numpy) where possible. This requires the precomputation of quite a few index arrays. Thus, this implementation is memory-intensive, which should not be a problem given the intended use.
There is still quite some optimization potential here. I have decided to not optimized this code further just now, since we do not know yet whether the bispectrum descriptors will always be used, or if, e.g. other types of neural networks will be used in the future. The implementation here is fast enough to be used for tests of small test systems during development, which is the sole purpose. If we eventually decide to stick with bispectrum descriptors and feed-forward neural networks, this code can be further optimized and refined. I have left some guidance in the code on what to try/what has already been done, should someone else want to give it a try.
Final note: if we want to ship MALA with its own bispectrum descriptor calculation to be used at scale, the best way would potentially be via self-maintained C++-functions.