-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.3.0 - Into the multi-GPU-niverse #616
Conversation
…flow_test to Be_model
openPMD I/O: fix parallel flushing
Resolving pytest issues
Its better to use the DOI which always points to the latest version of the test data repo. This avoids updating the CI at several places each time there is a new version of the test data repo. Co-authored-by: David Pape <[email protected]>
The top level directory in the zip file is suffixed with a commit hash that relates to the downloaded test data repository. Subsequent steps in the pipeline expect this directory have the name `test_data`. This snippet avoids manual renaming of the extracted folder with each newer version of the test data repository.
Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3, actions/cache@v3.
This fixes Node.js 16 deprecation warnings in cpu-tests.yml
Use RODARE api instead of hard coded URL:
Remove caches after pushes to develop/master (+tags)
The diffs of the two Conda environments are now displayed next to each other to make it easier to spot a discrepancy between the two.
Enhance diff output of Conda environments
…npmd Fix CI installation of openPMD-api
This is a temporary fix to make the caching mechanism in the CI work again. Its currently broken due to a switch to BuildKit as the default builder for Docker Engine as of version 23.0 (2023-02-01).
Use legacy builder to build Docker image
- Update workflow name to match style of the other workflows - Fix: Node.js 16 actions are deprecated. Please update the following actions to use Node.js 20: actions/checkout@v3. - Fix indentation isues
Update mirror-to-casus.yml
build_total_energy_energy_module.sh -> build_total_energy_module.sh Link to github issue documenting issues when building QE with cmake.
doc: link to GPU usage docs from lammps install section
Quickfixing the ACSD
…xing necessary in the CI now
Align MALA version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Fantastic job @RandomDefaultUser!
Recovering DDP scalability
The CI is currently failing because there is an "internal server error" being sent back by RODARE... I don't know why that is, but it is most likely only a temporary problem of RODARE. I will resubmit the CI later today, and if the error persists, on Monday. It it persists thereafter, I will contact RODARE staff. |
Looks good to me, thank you @RandomDefaultUser ! |
New features
Multi-GPU inference: Models can now make predictions on an arbitrary number of GPUs
Multi-GPU training: Models can now be trained on an arbitrary number of GPUs
MALA now works with 2D materials, i.e., any system which is only periodic in two dimensions
Bispectrum descriptor calculation now possible in python
Logging for network training has been overhauled and now allows for the logging of multiple metrics
(EXPERIMENTAL) Implementation of a mutual information based metric to replace/complement the ACSD
(EXPERIMENTAL) Implementation of a class for LDOS alignment to a reference energy value; this can be useful for models across multiple mass densities
Changes to API/user experience
use_lammps
- enable/disable LAMMPS (enabled by default, recommended for optimal performance, will automatically be disabled if no LAMMPS is found on the machine)use_atomic_density_formula
- enable the use of total energy evaluation based on a Gaussian representation (enabled if LAMMPS and GPU are enabled, recommended for optimal performance, details can be found in our paper on size transfer)use_ddp
- enable/disable DDP, i.e., Pytorch's distributed training scheme (disabled by default)SNAP
and all associated options are deprecated, useBispectrum
and associated options insteadMALA
now reads models withload_run()
instead ofload_model
which is more consistent with the rest of MALATester
class has been improved, all errors and energy values reported there are now consistently given in meV/atomFixes