-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #17 from arcann-chem/docs_olaia
Docs olaia
- Loading branch information
Showing
14 changed files
with
395 additions
and
191 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
# Contributions # | ||
# Contributions | ||
|
||
We warmly welcome contributions to ArcaNN. If you have ideas, code contributions, or suggested optimizations, please feel free to submit them. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,38 +1,297 @@ | ||
# SN2 | ||
|
||
SHORT INTRO | ||
Here we introduce the basic usage of the ArcaNN software, ilustrated by a SN2 reaction. All the files are available in the [GitHub Repository](https://github.com/arcann-chem/arcann_training/); and after ArcaNN installation, you will find them at `examples/sn2_ch3cl_br/` inside your local `arcann_traininig` directory. | ||
|
||
The iterative training and dataset generation for the SN2 reaction, comprised two iterative trainings : a first non-reactive training was performed on reactant and products structures, followed by a reactive training where transition structures were generated. | ||
|
||
The files set up for the non-reactive SN2 ArcaNN training is illustrated bellow. Then, the ArcaNN inputs for each step of the first iteration and the corresponding control `json` files are detailed. | ||
|
||
## User files | ||
We will start by creating a `user_files/` directory (See [Iterative procedure prerequisites](../usage/iter_prerequisites.md)) where we will include the necessary files for each step of the procedure. You also need to create a `data/` directory where the initial labeled datasets will be stored. For the reactive training, you will store the datasets of the non-reactive training in the corresponding `data/` directory, together with the initial datasets. | ||
|
||
For the non-reactive training, 6 systems were defined : 3 systems to explore the reactant basin (`ch3cl_br_close_300K`, `ch3cl_br_free_300K`, `ch3cl_br_smd_300K`) and 3 systems to explore the product basin (`ch3br_cl_close_300K`, `ch3br_cl_free_300K`, `ch3br_cl_smd_300K`). | ||
|
||
In the `user_files/` folder you will find the following files for each one of the systems (for clarity purposes, we only indicate the files of the `ch3cl_br_close_300K` system here). Note also that `hpc1` and `hpc2` are the machine keywords indicated in the machine.json file, see [HPC Configuration](../getting-started/hpc_configuration.md). | ||
|
||
**JSON FILES** | ||
|
||
- `machine.json` : file containing the cluster parameters. | ||
- `dp_train_2.1.json` : input for DeePMD trainings. | ||
|
||
|
||
**JOB FILES** | ||
|
||
- `job_lammps-deepmd_explore_gpu_hpc1.sh` and `job-array_lammps-deepmd_explore_gpu_hpc1.sh` : job scripts for exploration | ||
- `job_CP2K_label_cpu_hpc1.sh` and `job-array_CP2K_label_hpc1.sh`: job scripts for labeling | ||
- `job_deepmd_compress_gpu_hpc1.sh`, `job_deepmd_freeze_gpu_hpc1.sh` and `job_deepmd_train_gpu_hpc1.sh` job scripts for training | ||
|
||
|
||
**CP2K FILES** | ||
|
||
- `1_ch3cl_br_close_300K_labeling_XXXXX_hpc1.inp`, `2_ch3cl_br_close_300K_labeling_XXXXX_hpc1.inp`, `1_ch3cl_br_close_300K_labeling_XXXXX_hpc1.inp`, `2_ch3cl_br_close_300K_labeling_XXXXX_hpc2.inp` : inputs for CP2K labeling. There are 2 input files per subsystem, see details in [labeling](../labeling). | ||
|
||
|
||
**LAMMPS FILES** | ||
|
||
- `ch3cl_br_close_300K.lmp` : starting configurations for the first exploration in the LAMMPS format. | ||
- `ch3cl_br_close_300K.in` : inputs for LAMMPS exploration. | ||
|
||
- `plumed_SYSTEM_300K.dat` : plumed input files for the emplorations. | ||
|
||
Additional plumed files can be used, and must be named as `plumed_KEYWORD_SYSTEM.dat`. Here, we used an additional plumed file to store colvars and another to define the key atoms : `plumed_colvars_ch3cl_br_close_300K.dat` and `plumed_atomdef_ch3cl_br_close_300K.dat`. | ||
|
||
|
||
The atom order is defined in the `properties.txt` file. It makes sure that the order of the atoms in the `SYSTEM.lmp` files match the order indicated in the `"type_map"` keyword of the DeePMD-kit `dptrain_2.1.json` training file. Also, it makes sure that the generated structures also presents the correct atom numbering to avoid conflicts. | ||
|
||
|
||
|
||
## User FILES | ||
### Machine.json | ||
### LMP FILES | ||
### PLUMED FILES | ||
### PROPERTIES FILES | ||
### JOB_FILES | ||
|
||
## Initialization | ||
|
||
### INPUTS | ||
Decrire le used_inputs | ||
et quelle partie est utilisée dans quelle phase | ||
### OUTPUTS | ||
Decrire les control/*.json | ||
After the initialization step, a `default_input.json` file is generated, containing the name of the `LMP` systems found in the `user_files/`, and the default number of NNP for training defined in ArcaNN. | ||
|
||
```JSON | ||
{ | ||
"systems_auto": ["ch3br_cl_close_300K", "ch3br_cl_free_300K", "ch3br_cl_smd_300K", "ch3cl_br_close_300K", "ch3cl_br_free_300K", "ch3cl_br_smd_300K"], | ||
"nnp_count": 3 | ||
} | ||
``` | ||
|
||
## Training | ||
|
||
### INPUTS | ||
### OUTPUTS | ||
You can now move to the `000-training` directory corresponding to the training of the first generation of NNP. After running the `prepare` phase, a `default_input.json` file is created. In order to modify some of the default parameters, an `input.json` file must be created in the same directory, where only the parameters to be updated need to be indicated as the following: | ||
|
||
```JSON | ||
{ | ||
"user_machine_keyword_train": "v100_myproject1", | ||
"job_walltime_train_h": 12.0 | ||
} | ||
|
||
``` | ||
|
||
Then, the input is updated and stored in the directory as `used_input.json`: | ||
|
||
|
||
```JSON | ||
{ | ||
"user_machine_keyword_train": "v100_myproject1", | ||
"user_machine_keyword_freeze": "v100_myproject1", | ||
"user_machine_keyword_compress": "v100_myproject1", | ||
"job_email": "", | ||
"use_initial_datasets": true, | ||
"use_extra_datasets": false, | ||
"deepmd_model_version": 2.1, | ||
"job_walltime_train_h": 12.0, | ||
"mean_s_per_step": 0.108, | ||
"start_lr": 0.001, | ||
"stop_lr": 1e-06, | ||
"decay_rate": 0.9172759353897796, | ||
"decay_steps": 5000, | ||
"decay_steps_fixed": false, | ||
"numb_steps": 400000, | ||
"numb_test": 0 | ||
} | ||
``` | ||
|
||
The corresponding `control` file in your local `$WORKDIR/control/` is updated after the execution of each `phase`. Once the `000-training` step is finished, you will find the following `training_000.json` file: | ||
|
||
```JSON | ||
{ | ||
"user_machine_keyword_train": "v100_myproject1", | ||
"user_machine_keyword_freeze": "v100_myproject1", | ||
"user_machine_keyword_compress": "v100_myproject1", | ||
"job_email": "", | ||
"use_initial_datasets": true, | ||
"use_extra_datasets": false, | ||
"deepmd_model_version": 2.1, | ||
"job_walltime_train_h": 12.0, | ||
"mean_s_per_step": 0.039030916666666665, | ||
"start_lr": 0.001, | ||
"stop_lr": 1e-06, | ||
"decay_rate": 0.9172759353897796, | ||
"decay_steps": 5000, | ||
"decay_steps_fixed": false, | ||
"numb_steps": 400000, | ||
"numb_test": 0, | ||
"training_datasets": ["init_ch3br_cl_xxxxx_1001_4001_60", "init_ch3cl_br_xxxxx_1001_4001_60"], | ||
"trained_count": 1000, | ||
"initial_count": 1000, | ||
"added_auto_count": 0, | ||
"added_adhoc_count": 0, | ||
"added_auto_iter_count": 0, | ||
"added_adhoc_iter_count": 0, | ||
"extra_count": 0, | ||
"is_prepared": true, | ||
"is_launched": true, | ||
"is_checked": true, | ||
"is_freeze_launched": true, | ||
"is_frozen": true, | ||
"is_compress_launched": true, | ||
"is_compressed": true, | ||
"is_incremented": true, | ||
"min_nbor_dist": 0.9898124626241066, | ||
"max_nbor_size": [30, 45, 1, 1, 17], | ||
"median_s_per_step": 0.038560000000000004, | ||
"stdeviation_s_per_step": 0.0011691332942493009 | ||
} | ||
``` | ||
When a `phase` is executed succesfully, the corresponding `"is_prepared"`, `"is_launched"`, `"is_checked"`, etc. keywords are set to `true` | ||
Additional performance data, such as the mean time (`"mean_s_per_step"`), median time (`"median_s_per_step"`) and standard deviation (`"stdeviation_s_per_step"`) per training step are reported in this file. | ||
|
||
|
||
|
||
## Exploration | ||
|
||
### INPUTS | ||
### OUTPUTS | ||
After the first training phase you now have starting NNP that can be used to propagate reactive MD. After executing the `prepare` phase in the `0001-exploration/` folder, you will obtain an `default_input.json` file with default values. | ||
|
||
We allow for the first exploration for slightly larger deviations by setting `"sigma_low"` keyword set to 0.15 eV/Ang. This is done by modifying the `input.json` and running `prepare` again. | ||
|
||
```JSON | ||
{ | ||
"sigma_low": 0.15 | ||
} | ||
``` | ||
|
||
The `used_input.json` becomes then: | ||
```JSON | ||
{ | ||
"user_machine_keyword_exp": "v100_myproject1", | ||
"job_email": "", | ||
"atomsk_path": "/programs/apps/atomsk/0.13.1/atomsk", | ||
"vmd_path": "/prod/vmd/1.9.4a43/bin/vmd_LINUXAMD64", | ||
"exploration_type": ["lammps", "lammps", "lammps", "lammps", "lammps", "lammps"], | ||
"traj_count": [2, 2, 2, 2, 2, 2], | ||
"temperature_K": [300.0, 300.0, 300.0, 300.0, 300.0, 300.0], | ||
"timestep_ps": [0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005], | ||
"previous_start": [true, true, true, true, true, true], | ||
"disturbed_start": [false, false, false, false, false, false], | ||
"print_interval_mult": [0.01, 0.01, 0.01, 0.01, 0.01, 0.01], | ||
"job_walltime_h": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0], | ||
"exp_time_ps": [10.0, 10.0, 41.0, 10.0, 10.0, 41.0], | ||
"max_exp_time_ps": [400, 400, 400, 400, 400, 400], | ||
"max_candidates": [50, 50, 50, 50, 50, 50], | ||
"sigma_low": [0.15, 0.15, 0.15, 0.15, 0.15, 0.15], | ||
"sigma_high": [0.7, 0.7, 0.7, 0.7, 0.7, 0.7], | ||
"sigma_high_limit": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0], | ||
"ignore_first_x_ps": [0.5, 0.5, 0.5, 0.5, 0.5, 0.5], | ||
"disturbed_start_value": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0], | ||
"disturbed_start_indexes": [[], [], [], [], [], []], | ||
"disturbed_candidate_value": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0], | ||
"disturbed_candidate_indexes": [[], [], [], [], [], []] | ||
} | ||
``` | ||
|
||
For the first iteration the default parameters are a good starting point. The `"traj_count"` keyword sets to 2 the number of simulations per NNP. and per system and `"timestep_ps"` sets to 0.0005 ps the timestep of the simulations. The `"disturbed_candidate_value"` keywords are all set to 0, so no disturbance is applied to the candidate structures that will be added to the training set. | ||
|
||
To perform the explorations, one directory per system is created, in which there will be 3 subdirectories (one per trained NNP) `1/`, `2/` and `3/`, in which again there will be 2 subdirectories (by default) `0001/` and `0002/`. This means that a total of 36 MD trajectories will be performed for this first iteration. Be careful, the total exploration time can quickly become huge, especially if you have many systems. | ||
|
||
If we have a look at the `exploration_001.json` file inside the `$WORKDIR/control/` folder: | ||
|
||
```JSON | ||
{ | ||
"atomsk_path": "/programs/apps/atomsk/0.13.1/atomsk", | ||
"user_machine_keyword_exp": "v100_myproject1", | ||
"deepmd_model_version": 2.1, | ||
"nnp_count": 3, | ||
"systems_auto": { | ||
"ch3br_cl_close_300K": { | ||
// exploration parameters from used_input.json | ||
}, | ||
"ch3br_cl_free_300K": { | ||
// | ||
}, | ||
"ch3br_cl_smd_300K": { | ||
// | ||
}, | ||
"ch3cl_br_close_300K": { | ||
// | ||
}, | ||
"ch3cl_br_free_300K": { | ||
// | ||
}, | ||
"ch3cl_br_smd_300K": { | ||
// | ||
} | ||
}, | ||
"is_locked": true, | ||
"is_launched": true, | ||
"is_checked": true, | ||
"is_deviated": true, | ||
"is_extracted": true, | ||
"nb_sim": 36, | ||
"vmd_path": "/prod/vmd/1.9.4a43/bin/vmd_LINUXAMD64" | ||
} | ||
``` | ||
|
||
The total number of MD simulations is indicated by the `"nb_sim"` keyword. The `"vmd_path"` and the `"atomsk_path"` correspond to the ones indicated in the `used_input.json`, but are not necessary if the code is already available in the ArcaNN path. When the `exploration` step is succesfully finished, all the `phase` keywords are set to `"true"`. | ||
|
||
|
||
|
||
|
||
## Labeling | ||
|
||
### INPUTS | ||
### OUTPUTS | ||
For the last `step` of the first iteration, we move to the `$WORKDIR/001-labeling/` folder to run the different `phases`. You should adapt the Slurm parameters for the electronic structure calculation to match the architecture of your system. In this case, the number of MPI processes per node is set to 16 with the `"nb_mpi_per_node"` keyword in the `input.json`: | ||
|
||
```JSON | ||
{ | ||
"user_machine_keyword_label": "mykeyword1", | ||
"nb_mpi_per_node": 16 | ||
} | ||
``` | ||
|
||
As usual, the `used_input.json` file will be updated consequently when re running the `prepare` phase: | ||
|
||
```JSON | ||
{ | ||
"user_machine_keyword_label": "mykeyword1", | ||
"job_email": "", | ||
"labeling_program": "cp2k", | ||
"walltime_first_job_h": [0.5, 0.5, 0.5, 0.5, 0.5, 0.5], | ||
"walltime_second_job_h": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0], | ||
"nb_nodes": [1, 1, 1, 1, 1, 1], | ||
"nb_mpi_per_node": [16, 16, 16, 16, 16, 16], | ||
"nb_threads_per_mpi": [1, 1, 1, 1, 1, 1] | ||
} | ||
``` | ||
The number of MPI processes hs been set to 16 for the 6 systems. The walltimes of both calculations (2 calculation are performed when using CP2, a first quick calculation at a lower level of theory and then the reference level) are kept at the default values. | ||
|
||
Here the reactive water calculations use full nodes and have a higher wall time of 1h30min. The wall times should be set for the first iteration but can be guessed automatically later using the average time per CP2K calculation measured in the previous iteration. We can now run the first 2 phases and wait for the electronic structure calculations to finish. When running the check phase there could be a message telling us that there are failed configurations in the `water-reactive` folder! We can see which calculations did not converge in the `water-reactive/water-reactive_step2_not_converged.txt` file. Suppose there were 2 failed jobs, the 13-th and the 54-th. We might just do `touch water-reactive/00013/skip` and `touch water-reactive/00054/skip` and run the `check` phase again. This time it will inform us that some configurations will be skipped, but the final message should be that check phase is a success. All that is left to do now is run the `extract` phase, clean up with the `clean` phase, store wavefunctions and remove all unwanted data and finally update our local folder. We have now augmented our total training set and might do a new training iteration and keep iterating until convergence is reached! | ||
|
||
|
||
Finally, we can check the `labeling_001.json` file in `$WORKDIR/control/`: | ||
|
||
```JSON | ||
{ | ||
"labeling_program": "cp2k", | ||
"user_machine_keyword_label": "mykeyword1", | ||
"systems_auto": { | ||
"ch3br_cl_close_300K": { | ||
// labeling parameters from used_input.json | ||
}, | ||
"ch3br_cl_free_300K": { | ||
// | ||
}, | ||
"ch3br_cl_smd_300K": { | ||
// | ||
}, | ||
"ch3cl_br_close_300K": { | ||
// | ||
}, | ||
"ch3cl_br_free_300K": { | ||
// | ||
}, | ||
"ch3cl_br_smd_300K": { | ||
} | ||
}, | ||
"total_to_label": 50, | ||
"launch_all_jobs": true, | ||
"is_locked": true, | ||
"is_launched": true, | ||
"is_checked": true, | ||
"is_extracted": true | ||
} | ||
``` | ||
|
||
## Test | ||
The total number of structures that have been selected labeled from the selected candidates in the previous exploration step is indicated with the `"total_to_label"` keyword. | ||
|
||
### INPUTS | ||
### OUTPUTS | ||
The first iteration is done. After executing the `extract` phase, the directories for the next iteration will be created. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# ArcaNN Installation Guide # | ||
# ArcaNN Installation Guide | ||
|
||
## Installation on Machines with Internet Access ## | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
# ArcaNN Requirements # | ||
# ArcaNN Requirements | ||
|
||
## Installation Requirements ## | ||
|
||
|
Oops, something went wrong.