CrystalFormer is a transformer-based autoregressive model specifically designed for space group-controlled generation of crystalline materials. The space group symmetry significantly simplifies the crystal space, which is crucial for data and compute efficient generative modeling of crystalline materials.
Generating Cs2ZnFe(CN)6 Crystal (mp-570545)
The model is an autoregressive transformer for the space group conditioned crystal probability distribution P(C|g) = P (W_1 | ... ) P ( A_1 | ... ) P(X_1| ...) P(W_2|...) ... P(L| ...)
, where
g
: space group number 1-230W
: Wyckoff letter ('a', 'b',...,'A')A
: atom type ('H', 'He', ..., 'Og')X
: factional coordinatesL
: lattice vector [a,b,c, alpha, beta, gamma]P(W_i| ...)
andP(A_i| ...)
are categorical distributuions.P(X_i| ...)
is the mixture of von Mises distribution.P(L| ...)
is the mixture of Gaussian distribution.
We only consider symmetry inequivalent atoms. The remaining atoms are restored based on the space group and Wyckoff letter information. Note that there is a natural alphabetical ordering for the Wyckoff letters, starting with 'a' for a position with the site-symmetry group of maximal order and ending with the highest letter for the general position. The sampling procedure starts from higher symmetry sites (with smaller multiplicities) and then goes on to lower symmetry ones (with larger multiplicities). Only for the cases where discrete Wyckoff letters can not fully determine the structure, one needs to further consider factional coordinates in the loss or sampling.
Notebooks: The quickest way to get started with CrystalFormer is our notebooks in the Google Colab and Bohrium (Chinese version) platforms:
- CrystalFormer Quickstart : GUI notebook demonstrating the conditional generation of crystalline materials with CrystalFormer;
- CrystalFormer Application : Generating stable crystals with a given structure prototype. This workflow can be applied to tasks that are dominated by element substitution.
Create a new environment and install the required packages, we recommend using python 3.10.*
and conda to create the environment:
conda create -n crystalgpt python=3.10
conda activate crystalgpt
Before installing the required packages, you need to install jax
and jaxlib
first.
pip install -U "jax[cpu]"
If you intend to use CUDA (GPU) to speed up the training, it is important to install the appropriate version of jax
and jaxlib
. It is recommended to check the jax docs for the installation guide. The basic installation command is given below:
pip install --upgrade pip
# NVIDIA CUDA 12 installation
# Note: wheels only available on linux.
pip install --upgrade "jax[cuda12]"
pip install -r requirements.txt
We release the weights of the model trained on the MP-20 dataset. More details can be seen in the model folder.
python ./main.py --folder ./data/ --train_path YOUR_PATH/mp_20/train.csv --valid_path YOUR_PATH/mp_20/val.csv
folder
: the folder to save the model and logstrain_path
: the path to the training datasetvalid_path
: the path to the validation datasettest_path
: the path to the test dataset
python ./main.py --optimizer none --test_path YOUR_PATH/mp_20/test.csv --restore_path YOUR_MODEL_PATH --spacegroup 160 --num_samples 1000 --batchsize 1000 --temperature 1.0
optimizer
: the optimizer to use,none
means no training, only samplingrestore_path
: the path to the model weightsspacegroup
: the space group number to samplenum_samples
: the number of samples to generatebatchsize
: the batch size for samplingtemperature
: the temperature for sampling
You can also use the elements
to sample the specific element. For example, --elements La Ni O
will sample the structure with La, Ni, and O atoms. The sampling results will be saved in the output_LABEL.csv
file, where the LABEL
is the space group number g
specified in the command --spacegroup
.
The input for the elements
can be also the json
file which specifies the atom mask in each Wyckoff site and the constraints. An example atoms.json
file can be seen in the data folder. There are two keys in the atoms.json
file:
atom_mask
: set the atom list for each Wyckoff position, the element can only be selected from the list in the corresponding Wyckoff positionconstraints
: set the constraints for the Wyckoff sites in the sampling, you can specify the pair of Wyckoff sites that should have the same elements
Before evaluating the generated structures, you need to transform the generated g, W, A, X, L
to the cif
format. You can use the following command to transform the generated structures to the cif
format and save as the csv
file:
python ./scripts/awl2struct.py --output_path YOUR_PATH --label SPACE_GROUP --num_io_process 40
output_path
: the path to read the generatedL, W, A, X
and save thecif
fileslabel
: the label to save thecif
files, which is the space group numberg
num_io_process
: the number of processes
Calculate the structure and composition validity of the generated structures:
python ./scripts/compute_metrics.py --root_path YOUR_PATH --filename YOUR_FILE --num_io_process 40
root_path
: the path to the datasetfilename
: the filename of the generated structuresnum_io_process
: the number of processes
Calculate the novelty and uniqueness of the generated structures:
python ./scripts/compute_metrics_matbench.py --train_path TRAIN_PATH --test_path TEST_PATH --gen_path GEN_PATH --output_path OUTPUT_PATH --label SPACE_GROUP --num_io_process 40
train_path
: the path to the training datasettest_path
: the path to the test datasetgen_path
: the path to the generated datasetoutput_path
: the path to save the metrics resultslabel
: the label to save the metrics results, which is the space group numberg
num_io_process
: the number of processes
Note that the training, test, and generated datasets should contain the structures within the same space group g
which is specified in the command --label
.
More details about the post-processing can be seen in the scripts folder.
@misc{cao2024space,
title={Space Group Informed Transformer for Crystalline Materials Generation},
author={Zhendong Cao and Xiaoshan Luo and Jian Lv and Lei Wang},
year={2024},
eprint={2403.15734},
archivePrefix={arXiv},
primaryClass={cond-mat.mtrl-sci}
}
Note: This project is unrelated to https://github.com/omron-sinicx/crystalformer with the same name.