Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descriptor calculation in python #510

Merged

Conversation

RandomDefaultUser
Copy link
Member

TL;DR

This PR implements the calculation of bispectrum and Gaussian descriptors in python. This implementation is significantly slower than LAMMPS and has some other restrictions, but it is fast and general enough to be used as a fallback during development and CI testing. Its intended audience are developers and students who did not set up LAMMPS yet or the CI.

Goals

The idea of having MALA do bispectrum calculations independently of other codes had been flowing around for a while. One of the main issues with being dependent on external libraries, from my point of view, is getting started with the code. Multiple times I have interacted with students who just wanted to try out some small part of the code, but did not want to set up LAMMPS and QE themselves. Therefore this PR and a similar one for the QE part, which will follow, implement the basic functionality of the external libraries. This python implementation is NOT meant to replace LAMMPS in production - it is, depending on the size of the system, orders of magnitude slower. It is merely a fallback option. An actual alternative to LAMMPS may be developed from this code base, given enough time.

New features

  • Gaussian/Bispectrum descriptors work in python, default remains LAMMPS
    • MALA will automatically switch to python if no LAMMPS is detected, and inform users
  • Added tests
  • Updated documentation

Current limitations

  • Python implementation is roughly an order of magnitude slower for small systems and doesn't scale too great (may be up to 2-3 orders of magnitude for large systems)
  • Python implementation only works for ONE chemical element
  • Python implementation has now MPI or GPU support

Note on optimization

The python implementation of the descriptors was developed by first copying the code directly and then optimizing it just enough to be usable. LAMMPS is written in C++, and as such, many for-loops which are optimized by the compiler can be employed. This is drastically inefficient in python, so functions were rewritten to use optimized vector-operations (e.g. via numpy) where possible. This requires the precomputation of quite a few index arrays. Thus, this implementation is memory-intensive, which should not be a problem given the intended use.

There is still quite some optimization potential here. I have decided to not optimized this code further just now, since we do not know yet whether the bispectrum descriptors will always be used, or if, e.g. other types of neural networks will be used in the future. The implementation here is fast enough to be used for tests of small test systems during development, which is the sole purpose. If we eventually decide to stick with bispectrum descriptors and feed-forward neural networks, this code can be further optimized and refined. I have left some guidance in the code on what to try/what has already been done, should someone else want to give it a try.

Final note: if we want to ship MALA with its own bispectrum descriptor calculation to be used at scale, the best way would potentially be via self-maintained C++-functions.

@RandomDefaultUser RandomDefaultUser merged commit b9ee85d into mala-project:develop Apr 5, 2024
5 checks passed
@RandomDefaultUser RandomDefaultUser deleted the descriptors_python branch October 25, 2024 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant