Alexandria database #107
Replies: 3 comments 2 replies
-
I think we can keep them, and for the modeling, just add a self-loop to include them as "graphs". Regarding hosting, Zenodo is great. Have you also looked into Colabfit as a place to host the dataset as well?
For my most recent paper on E(n)-GNN, this is the configuration I used: model_class: PLEGNNBackbone
model_args:
embed_in_dim: 256
embed_hidden_dim: 1024
embed_out_dim: 256
embed_depth: 3
embed_feat_dims: [256,256,256]
embed_message_dims: [256,256,256]
embed_position_dims: [64, 64]
embed_edge_attributes_dim: 0
embed_activation: silu
embed_residual: True
embed_normalize: True
embed_tanh: True
embed_activate_last: False
embed_k_linears: 1
embed_use_attention: False
embed_attention_norm: sigmoid
readout: sum
node_projection_depth: 3
node_projection_hidden_dim: 256
node_projection_activation: silu
prediction_out_dim: 1
prediction_depth: 3
prediction_hidden_dim: 128
prediction_activation: relu
num_atom_embedding: 201
encoder_only: true Will have to get back to you for FAENet :) |
Beta Was this translation helpful? Give feedback.
-
Thank you for the input parameters so far the machine learning is looking sensible. |
Beta Was this translation helpful? Give feedback.
-
This looks interesting. Right now the dataset is just the relaxed structures but we also have ten to a hundred times as many geometry optimization steps. When we get to publishing them, that might be an option. |
Beta Was this translation helpful? Give feedback.
-
Hi,
As I previously mentioned I would like to add the Alexandria database https://alexandria.icams.rub.de/ to the repository.
The implementation seems to be quite straight-forward after @laserkelvin tutorials and I have some machine learning running with the data right now.
The dataset includes ~400k PBEsol and 400k SCAN calculations of relaxed crystal structures and ~4.4M PBE relaxed crystal structures as well as 130k 2D and some 1D crystal structures. I am thinking of providing separate download options for the different datasets.
While one could set up a download through the OPTIMADE api that is painfully slow so we will probably just provide a link to some jsons on materials cloud or zenodo once we have sth out on arxiv for the database which should be rather soon. I will have to see if we can host the dataset at one of our universities until than.
The implementation of the dataset is rather similar to pymatgen e.g. using parse_structure, parse_symmetry with a pymatgen Structure object as input to the functions.
I started an implementation at https://github.com/JonathanSchmidt1/matsciml_alexandria .
I would have two questions:
Beta Was this translation helpful? Give feedback.
All reactions